Skip to main content
eLife logoLink to eLife
. 2021 Dec 3;10:e72783. doi: 10.7554/eLife.72783

Working memory capacity of crows and monkeys arises from similar neuronal computations

Lukas Alexander Hahn 1,, Dmitry Balakhonov 1,, Erica Fongaro 1, Andreas Nieder 2, Jonas Rose 1
Editors: Erin L Rich3, Michael J Frank4
PMCID: PMC8660017  PMID: 34859781

Abstract

Complex cognition relies on flexible working memory, which is severely limited in its capacity. The neuronal computations underlying these capacity limits have been extensively studied in humans and in monkeys, resulting in competing theoretical models. We probed the working memory capacity of crows (Corvus corone) in a change detection task, developed for monkeys (Macaca mulatta), while we performed extracellular recordings of the prefrontal-like area nidopallium caudolaterale. We found that neuronal encoding and maintenance of information were affected by item load, in a way that is virtually identical to results obtained from monkey prefrontal cortex. Contemporary neurophysiological models of working memory employ divisive normalization as an important mechanism that may result in the capacity limitation. As these models are usually conceptualized and tested in an exclusively mammalian context, it remains unclear if they fully capture a general concept of working memory or if they are restricted to the mammalian neocortex. Here, we report that carrion crows and macaque monkeys share divisive normalization as a neuronal computation that is in line with mammalian models. This indicates that computational models of working memory developed in the mammalian cortex can also apply to non-cortical associative brain regions of birds.

Research organism: Other

eLife digest

Working memory is the brain’s ability to temporarily hold and manipulate information. It is essential for carrying out complex cognitive tasks, such as reasoning, planning, following instructions or solving problems. Unlike long-term memory, information is not stored and recalled, but held in an accessible state for brief periods. However, the capacity of working memory is very limited. Humans, for example, can only hold around four items of information simultaneously.

There are various competing theories about how this limitation arises from the network of neurons in the brain. These models are based on studies of humans and other primates. But memory limitations are not exclusive to mammals. Indeed, the working memory of some birds, such as crows, has a similar capacity to humans despite the architecture of their brains being very different to mammals. So, how do brains with such distinct structural differences produce working memories with similar capacities?

To investigate, Hahn et al. probed the working memory of carrion crows in a change detection task developed for macaque monkeys. Crows were trained to memorize varying numbers of colored squares and indicate which square had changed after a one second delay when the screen went blank. While the crows performed the task, Hahn et al. measured the activity of neurons in an area of the brain equivalent to the prefrontal cortex, the central hub of cognition in mammals.

The experiments showed that neurons in the crow brain responded to the changing colors virtually the same way as neurons in monkeys. Hahn et al. also noticed that increasing the number of items the crows had to remember affected individual neurons in a similar fashion as had previously been observed in monkeys.

This suggests that birds and monkeys share the same central mechanisms of, and limits to, working memory despite differences in brain architecture. The similarities across distantly related species also validates core ideas about the limits of working memory developed from studies of mammals.

Introduction

Working memory (WM) can hold information for a short period of time to allow further processing in the absence of sensory input (Cowan, 2017; Oberauer et al., 2018). By bridging this gap between the immediate sensory environment and behavior, WM is a cornerstone for complex cognition. It is a very flexible memory system, yet severely limited in its capacity. While this capacity is often seen as a general cognitive bottleneck, for simple stimuli, like colors, the capacity is very similar between humans, monkeys, and crows (Balakhonov and Rose, 2017; Buschman et al., 2011; Cowan, 2001; Luck and Vogel, 1997). Different models have been proposed to conceptualize how this capacity limit arises. This work motivated many psychophysical and electrophysiological experiments that in turn led to a spectrum of more refined models of WM (Ma et al., 2014). ‘Discrete models’ of WM argue that a fixed number of items can be stored. Once this capacity is reached, an additional item can only be maintained if it replaces a previous item (Awh et al., 2007; Fukuda et al., 2010; Luck and Vogel, 1997; Vogel and Machizawa, 2004; Zhang and Luck, 2008). ‘Continuous models’ describe WM as a flexible resource that is allocated to individual items. A minimum amount of this resource has to be allocated to each item for successful retention, thereby resulting in a capacity limit (Bays and Husain, 2008; van den Berg et al., 2012; Wilken and Ma, 2004). On the neurophysiological level, models of WM capacity suggest that interference between memory representations (‘items’) within the neuronal network is a source of information loss and capacity limitation (Bouchacourt and Buschman, 2019; Lundqvist et al., 2016; Lundqvist et al., 2011; Schneegans et al., 2020). Interference may arise due to divisive normalization that appears as competition between items, related to oscillatory dynamics (Lundqvist et al., 2016; Lundqvist et al., 2011), WM flexibility (Bouchacourt and Buschman, 2019), and neuronal information sampling (Schneegans et al., 2020). Divisive normalization is a computational principle that acts upon neurons when presenting multiple stimuli simultaneously, it normalizes neuronal responses by creating ‘a ratio between the response of an individual neuron and the summed activity of a pool of neurons’ (Carandini and Heeger, 2011, p. 51). An effect related to divisive normalization can be observed when two stimuli are presented either individually or simultaneously within the receptive field of a visual sensory neuron. The neuron’s firing rate when the stimuli are presented simultaneously becomes normalized by the populations’ responses to each individual stimulus (Carandini et al., 1997; Heeger, 1992). This effect also occurs in relation to attentive processes and has been formalized into the ‘normalization model of attention’ (Reynolds et al., 1999; Reynolds and Heeger, 2009). Normalization of neuronal responses is commonly observed in many species throughout the animal kingdom, not just in sensory, but also in cognitive domains (Carandini and Heeger, 2011). Investigations into WM capacity and model predictions focus mostly on humans and monkeys. By extending this work to include birds, one can gain a unique comparative perspective. Crows have a similar limit in WM capacity and neuronal correlates of WM are comparable to monkeys’ (Balakhonov and Rose, 2017; Nieder, 2017). But while the neuronal architecture of sensory areas is similar between birds and mammals, higher associative areas, critical for WM, do not share a common architecture between the species (Stacho et al., 2020). Therefore, an outstanding question is whether modern models of WM such as the ‘flexible model’ capture WM capacity in general, or if their predictions (e.g., divisive normalization) are confined to the mammalian neocortex. To resolve this, it is crucial to investigate the avian brain to understand how its different organization can produce such similar behavioral and neurophysiological results. While the neuronal correlates of WM maintenance in birds have been investigated in some detail (Diekamp et al., 2002a; Hartmann et al., 2018; Rinnert et al., 2019; Rose and Colombo, 2005; Veit et al., 2014), a neurophysiological investigation of WM capacity limitation is still lacking. The avian forebrain structure, nidopallium caudolaterale (NCL) is a critical component of avian WM. The NCL is considered functionally equivalent to the mammalian prefrontal cortex (PFC) (Güntürkün and Bugnyar, 2016; Nieder, 2017), as it receives projections from all sensory modalities (Kröner and Güntürkün, 1999), projects to premotor areas (Kröner and Güntürkün, 1999), and is a target of dopaminergic innervation (Waldmann and Güntürkün, 1993). To investigate the neurophysiology of WM capacity in birds, we adopted a task design developed for monkeys (Buschman et al., 2011) to use it with carrion crows (Corvus corone). Our animals were trained to memorize an array of colors and to indicate which color had changed after a short memory delay, while we performed extracellular recordings of individual neurons in the NCL using multi-channel probes. We expected to find a clear correlate of WM representations in NCL neurons and a load-dependent response modulation based on divisive normalization of neuronal responses. This would allow us to evaluate if the behavioral WM capacity observations of crows fit a ‘discrete’ or ‘continuous’ WM resource model. If the neuronal responses also fit the contemporary neurophysiological models of WM capacity limitations (Bouchacourt and Buschman, 2019; Lundqvist et al., 2016; Lundqvist et al., 2011; Schneegans et al., 2020), it would further suggest that crows and monkeys have convergently evolved a similar neurophysiological basis for WM capacity despite a different architecture of the critical forebrain structures.

Results

The WM capacity of crows is similar to that of monkeys

The behavioral performance was influenced by the number of colored squares on the screen. It significantly decreased with an increasing number of ipsilateral squares (median performances, load 1: 95.88%, load 2: 78.31%, load 3: 58.21%; Friedman test: Χ² = 92.00, p < 0.001, Figure 1B). We ran a generalized linear model with ipsilateral load (i.e., load of hemifield where a color changed), contralateral load (i.e., load of hemifield without a color change) and their interaction as predictors for performance (R2adj 0.78, F(460,456) = 555.00, p < 0.001). We found that the number of ipsilateral colors significantly reduced performance (βipsi –0.177, t(459) = –18.00, p < 0.001), whereas the number of contralateral colors did not (βcontra –0.021, t(459) = –1.77, p = 0.0772; Figure 1C). There was also a significant interaction between ipsilateral and contralateral load (β = –0.024, t(458) = –4.28, p < 0.001). We compared this model to a reduced model, where we omitted the non-significant βcontra and found that this reduced model (R2adj 0.78, F(460,457) = 828.00, p < 0.001) explained the performance equally as well (|ΔLLR| = 0.0102). Therefore, we conclude that contralateral load by itself did not significantly affect performance. We calculated the capacity K (see Materials and methods) for all full WM loads (i.e., two to five items). The capacity K peaks at four items (mean ± SEM: 3.05±.038, Figure 1—figure supplement 1). These observations are very similar to observations made in primates (Buschman et al., 2011) and fully reproduce our earlier behavioral findings (Balakhonov and Rose, 2017).

Figure 1. Behavioral overview.

(A) Behavioral paradigm (reproduced from Balakhonov and Rose, 2017). The birds had to center and hold their gaze for the duration of the sample and delay period, and subsequently indicate which colored square had changed. (B) Boxplot of performance for different ipsilateral loads (i.e., on the side where the change occurred). Horizontal lines indicate significant differences between loads, box indicates the median, 1st, and 3rd quartile (whiskers extend to 1.5 times the inter-quartile range). (C) Mean performance matrix for ipsi- and contralateral load combinations (values are rounded to the nearest integer). Additional contralateral items at an ipsilateral load of 1 barely affected performance (bottom row). At higher ipsilateral loads additional contralateral items reduced performance more clearly (middle and top row). Statistical modeling revealed an interaction at these higher loads (see text).

Figure 1.

Figure 1—figure supplement 1. Capacity of crow working memory (WM).

Figure 1—figure supplement 1.

Line indicates capacity K at different loads. The peak at four items indicates the capacity. Dashed lines indicate maximum capacity and fixed capacity of 1. Error bars indicate the standard error of the mean.

Neurons of the NCL encode the color identity and maintain it in WM

We recorded 362 neurons from the NCL of two crows performing the WM task (delayed change localization). All reported effects were also present in each individual bird (Figure 3—figure supplements 12), we, therefore, pooled the data for population analysis. A large subset of neurons responded to the presence of a color (i.e., at load 1) by substantially increasing or decreasing their firing rate relative to baseline. This change in firing rate occurred selectively, depending on the presented color either in the sample (Figure 2A) or the delay period (Figure 2—figure supplement 1). For most neurons, this difference in firing rate between the two possible colors became attenuated when the load increased from one to two colors, and it was further attenuated from two to three colors. To quantify this effect, we calculated the amount of information about the color identity at a neuron's favorite location as the percent explained variance (PEV) during a memory load of one, two, or three items in bins of 200 ms (see Materials and methods for details). Most neurons did not sustain information about color (measured as a significant PEV, henceforth ‘information’) throughout the entire sample or memory delay but rather had shorter periods in which the information was significant (Figure 2A bottom).

Figure 2. Color discrimination in the neuronal response (information, percent explained variance [PEV]) generally decreases with load, but some neurons show the opposite effect.

Shown are the three ipsilateral load conditions (i.e., load increases on the same side as the neuron’s favorite location). Ipsilateral loads are one (blue), two (yellow), and three (red). The labels ‘color A’ and ‘color B’ always refer to the same pair of colors at the neuron’s favorite location, irrespective of the load condition. (A) Example of a sample neuron with color information decline at load 1 (blue), load 2 (yellow), and load 3 (red). Top: raster plot, where every dot represents a single spike during the individual trials (rows of dots); middle: peri-stimulus-time histogram (PSTH) of average firing rate (solid line for color ID 1, dashed line for color ID 2) with the standard error of the mean (shaded areas); bottom: percent explained variance of color identity (a measure of information about color) along the trial, the line at the top of the y-axis indicates significant bins. (B) Same as in (A) for an example of a delay neuron with information gain at a higher load.

Figure 2.

Figure 2—figure supplement 1. Color discrimination in the neuronal response (information, percent explained variance [PEV]) decreases with load.

Figure 2—figure supplement 1.

Example of a delay neuron with color information decline, at load 1 (blue), load 2 (green), and load 3 (red). Top: raster plot, where every dot represents a single spike during the individual trials (rows of dots); middle: peri-stimulus-time histogram (PSTH) of average firing rate (solid line for color ID 1, dashed line for color ID 2) with the standard error of the mean (shaded areas); bottom: percent explained variance of color identity (a measure of information about color) along the trial, the line at the top of the y-axis indicates significant bins.
Figure 2—figure supplement 2. Color discrimination in the neuronal response (information, percent explained variance [PEV]) increases with load.

Figure 2—figure supplement 2.

Example of a sample neuron with color information gain; at load 1 (blue), at load 2 (green), and load 3 (red). Top: raster plot, where every dot represents a single spike during the individual trials (rows of dots); middle: peri-stimulus-time histogram (PSTH) of average firing rate (solid line for color ID 1, dashed line for color ID 2) with the standard error of the mean (shaded areas); bottom: percent explained variance of color identity (a measure of information about color) along the trial, the line at the top of the y-axis indicates significant bins.

To better capture the time points when the individual neurons carried information, we performed a hierarchical clustering analysis of the PEV values of the individual neurons at load 1 (see Materials and methods for details). We found a total of seven clusters that were organized into two overarching groups (Figure 3A). Group 1 contained neurons (n = 227) that showed peak information during the sample and early delay phase, while group 2 contained neurons (n = 135) that showed peak information during the delay phase. For each neuron, we then calculated if it carried a significant amount of color information by applying a permutation test (for all bins at load 1, see Materials and methods). The individual neurons were then further classified into three groups depending on the phase in which they had a significant amount of information (Figure 3B). Overall, 37.57% (n = 136) of neurons were significant during the sample phase, 9.39% (n = 34) of neurons were significant during the memory delay, and 14.64% (n = 53) of neurons were significant during both the sample phase and the memory delay (all proportions of neurons were significantly higher than expected by chance [binomial test, see Materials and methods, all p < 0.001]). Refer to Figure 2A for an example neuron, significant at load 1 with a large differentiation in firing rate between color identities (a large PEV) and a loss of differentiation with increasing ipsilateral load. Further inspection of individual neuronal activity revealed, however, that a substantial number of neurons responded differently. Instead of losing information at higher loads, many neurons gained information (e.g., Figure 2B, Figure 2—figure supplement 2). Thus, we additionally performed the permutation testing for loads 2 and 3 to determine which neurons had significant information (see Materials and methods). We found that many of the neurons that did not have significant information at load 1 did have significant information at load 2 and load 3 (Figure 3C). For the memory delay, more than half of the significant neurons we detected were only significant for either load 2 or load 3, compared to only 36% of neurons that were significant at load 1 (Figure 3C middle). By including the higher loads in our analysis, we found a total of 249 (68.78%) sample neurons and 94 (25.97%) delay neurons. For the population analyses, we subsequently pooled all significant neurons into three groups (one per load). These pooled groups were then each subdivided into sample and delay neurons (i.e., ‘sample-load1’, ‘delay-load1’, ‘sample-load2’, etc., see Table 1 in the Materials and methods for an overview).

Figure 3. Overview of recorded neurons.

(A) The neuronal population can be best described by seven individual clusters. (B) Percentages of neurons (total n = 362) with significant color information at load 1, during the sample and the delay. (C) Percentages (rounded) of significant neurons in individual load conditions for sample (n = 249), delay (n = 94), and sample and delay (n = 133). The pieces of the pies depicting significance at a specific load relate to the number of significant neurons in the respective phase (e.g., 36% of the 94 delay neurons (i.e., 34 neurons) are significant at load 1 (all pieces contain blue), these are the same neurons that make up the 9.36% of the total 362 neurons depicted in B).

Figure 3.

Figure 3—figure supplement 1. Overview of analyses for bird 1.

Figure 3—figure supplement 1.

(A) The neuronal population can be best described by seven individual clusters. (B) Percentages (rounded) of significant neurons in individual load conditions for sample (n = 68), delay (n = 19), and sample and delay (n = 37). (C) On correct trials (left) color is represented during the early and late phase of the sample and, to a lesser degree, during the early and late delay. On error trials (right), color information can be found in the early sample phase at load 2, and in the late sample phase at loads 2 and 3 (asterisks). Analysis of load 1 error trials was omitted due to their very low abundance. Statistical comparisons of correct vs. error trial information were performed on sub-sampled correct trials. Early and late sample each 400 ms, early and late delay each 500 ms, error bars indicate the standard error of the mean. (D) Divisive normalization-like regularization was observable for neuronal responses of neurons losing information (top) but not for neurons gaining color information at load 2 (bottom). Selectivity (SE) indicates how much the neuronal response is influenced by a color, relative to a second color when either is presented alone. Sensory interaction (SI) indicates how much the neuronal response is influenced by either color when both were displayed simultaneously. Slopes close to 0.5 indicate an equal influence of both colors. Slopes <0.5, or >0.5 indicate a weighted influence of a color. (Top) Information-carrying neurons in the sample (n = 35; left) and delay (n = 15; right) population. (Bottom) Information gaining neurons in the sample (n = 10; left) and delay (n = 3; right) population. The red line indicates the regression fit.
Figure 3—figure supplement 2. Overview of analyses for bird 2.

Figure 3—figure supplement 2.

(A) The neuronal population can be best described by seven individual clusters. (B) Percentages (rounded) of significant neurons in individual load conditions for sample (n = 181), delay (n = 75), and sample and delay (n = 96). (C) On correct trials (left) color is represented during the early and late phase of the sample and, to a lesser degree, during the early and late delay. On error trials (right), color information can be found in the late sample phase at loads 2 and 3 (asterisks). Analysis of load 1 error trials was omitted due to their very low abundance. Statistical comparisons of correct vs. error trial information were performed on sub-sampled correct trials. Early and late sample each 400 ms, early and late delay each 500 ms, error bars indicate the standard error of the mean. (D) Divisive normalization-like regularization was observable for neuronal responses of neurons losing information (top) but not for neurons gaining color information at load 2 (bottom). Selectivity (SE) indicates how much the neuronal response is influenced by a color, relative to a second color when either is presented alone. Sensory interaction (SI) indicates how much the neuronal response is influenced by either color when both were displayed simultaneously. Slopes close to 0.5 indicate an equal influence of both colors. Slopes <0.5 or >0.5 indicate a weighted influence of a color. (Top) Information-carrying neurons in the sample (n = 70; left) and delay (n = 28; right) population. (Bottom) Information gaining neurons in the sample (n = 46; left) and delay (n = 5; right) population. The red line indicates the regression fit.

Table 1. Overview of significant groups.

The ‘+’ denotes that a neuron of the respective group had a significant percent explained variance (PEV) in the respective load condition. The ‘-’ denotes that a neuron of the respective group did not have a significant PEV in the load respective condition. The pooled groups contained only neurons with a ‘+’ for the respective load condition.

Load 1 Load 2 Load 3 Group name
+ - - Load 1 neurons Group I
- + - Load 2 neurons Group II
- - + Load 3 neurons Group III
+ + - Load 1 and 2 neurons Group IV
+ - + Load 1 and 3 neurons Group V
- + + Load 2 and 3 neurons Group VI
+ + + Load 1, 2, and neurons Group VII
Pooled group 1 Pooled group 2 Pooled group 3

The neuronal population has gradually less information with increasing load

The clustering analysis indicated that the population of neurons as a whole did sustain the color information throughout the entire trial (Figure 3A). Plotting the information averages of each of the three ‘sample populations’ and the ‘delay populations’ over time confirmed this result (Figure 4A, Figure 4—figure supplement 1). After the onset of the stimulus array, the average information exhibited a sharp increase that peaked roughly 400 ms after stimulus onset and remained at an elevated level throughout the memory delay, until the choice array appeared. Results obtained from neurons of the lateral PFC of monkeys indicated distinct hemispheric independence of WM capacity (Buschman et al., 2011). This means that increasing ipsilateral load (i.e., load in the hemifield containing the target for which information is assessed) should affect neuronal processing while increasing contralateral load should not. This effect might be further emphasized in birds due to the full decussation of their optic nerve (Husband and Shimizu, 2001). Parallel to the behavioral results and in line with the results from monkeys, we found a strong effect of ipsilateral load on the information maintenance, as there was a sharp drop in information when the load increased from one item to two items (Figure 4A, blue and yellow curves). The addition of a third item only slightly decreased the maintained information further (Figure 4A, red curve). The load dependence was much more pronounced during the sample period than during the memory delay where the information remained at a lower elevated level. Notably, the load effect was only present for ipsilateral manipulations. If the number of items on the contralateral side was increased, the information encoded about the colors at the favorite location did not change (Figure 4A, right). To compare our results to the results obtained in monkeys we also applied the method of Buschman et al., 2011 for testing the ipsilateral load effect during the sample and delay phase, by splitting each phase into an early and a late portion (first and second 400 ms of the sample, and first and second 500 ms for the delay). We did find a significant drop in information with an increase in load from one through three in the early (F(2,537) = 18.73, p < 0.001, ω² = 0.0616) and late (F(2,536) = 20.07, p < 0.001, ω² = 0.0661) sample period and the early (F(2,267) = 6.88, p = 0.0012, ω² = 0.0417) and late (F(2,267) = 3.85, p = 0.0225, ω² = 0.0207) delay period (Figure 4B). There was a large and significant drop between one and two items (post hoc Bonferroni corrected multiple comparisons: early and late sample p < 0.001, early delay p < 0.001, late delay p = 0.0198) and one and three items (post hoc Bonferroni corrected multiple comparisons: early and late sample p < 0.001, early delay p = 0.019, late delay p > 0.05) but no difference between loads 2 and 3 (post hoc Bonferroni corrected multiple comparisons: all p > 0.05). The maintenance of a significant amount of information at higher loads (even for three items, early sample t(156) = 7.55, p < 0.001; late sample t(156) = 8.73, p < 0.001; early delay t(87) = 3.84, p < 0.001; late delay t(87) = 4.73, p < 0.001) and its gradual reduction when items were added to the corresponding hemifield are indicative of a flexible resource allocation and not an all-or-nothing slot-like WM. Furthermore, if there is a flexible resource, in error trials a small but insufficient amount of resource might still be allocated to an item. Indeed, error trial analysis (applying correct trial sub-sampling, see Materials and methods) for the load 2 and 3 conditions further supported this interpretation. The amount of information in the early and late sample phase remained above zero (load 2: early, t(186) = 3.25, p = 0.0014; late, t(186) = 5.33, p < 0.001; load 3: t(156) = 4.21, p < 0.001; Figure 4B asterisks), and was significantly smaller than in correct trials (load 2: late, t(186) = 2.81, p = 0.0055, d = 0.26; load 3: late t(156) = 2.55, p = 0.0117, d = 0.23). Additionally, there was no further maintenance throughout the memory delay at any load (Figure 4B, PEV at loads 2 and 3 in error trials delay, all non-significant). This indicates that a failure to report which color had changed at higher loads (two and three ipsilateral items) resulted from a smaller amount of information encoding during the sample phase that was not maintained throughout the delay. A possible alternative ‘slot-model’ explanation would be that, on error trials, the color information was completely lost after the sample phase, because it was not successfully transferred into a slot (or that a slot was not available to take on information). The graded amount of information on correct trials is not compatible with the simple (all or none) slot model, but could fit the ‘slots and averaging model’ (Zhang and Luck, 2008).

Figure 4. Information encoding at the population level.

(A) Color information (percent explained variance [PEV]) decreases with an increasing ipsilateral load (i.e., on the same side as the neuron’s favorite location) but not with an increasing contralateral load (i.e., on the opposite side to the neuron’s favorite location). (B) On correct trials (left) color is represented during the early and late phase of the sample and, to a lesser degree, during the early and late delay. On error trials (right), color information can be found in the early sample phase at load 2, and in the late sample phase at loads 2 and 3 (asterisks). Analysis of load 1 error trials was omitted due to their very low abundance. Statistical comparisons of correct vs. error trial information were performed on sub-sampled correct trials. Early and late sample each 400 ms, early and late delay each 500 ms, shaded areas, and error bars indicate the standard error of the mean.

Figure 4.

Figure 4—figure supplement 1. Sample population (A) and delay population (B), same as Figure 4A with full time axis.

Figure 4—figure supplement 1.

Notably, the ‘delay populations’ (B) also showed an elevated level of information during the sample, whereas the ‘sample populations’ (A) did not show an elevated level of information during the delay.
Figure 4—figure supplement 2. Same as Figure 4B, after applying a more stringent criterion on neuronal significance (see text).

Figure 4—figure supplement 2.

On correct trials (left) color is represented during the early and late phase of the sample and, to a lesser degree, during the early and late delay. On error trials (right), color information can be found in the early sample phase at load 2, and in the late sample phase at loads 2 and 3 (asterisks). Analysis of load 1 error trials was omitted due to their very low abundance. Statistical comparisons of correct vs. error trial information were performed on sub-sampled correct trials. Early and late sample each 400 ms, early and late delay each 500 ms, error bars indicate the standard error of the mean.

Higher loads produce divisive normalization-like neuronal responses

We next wanted to understand the neuronal mechanisms behind the information loss at higher WM loads. For that, we analyzed how the responses of individual sample and delay neurons changed when the load increased from one color to two colors. For the ‘sample populations’ and the ‘delay populations’, an increasing number of items reduced the amount of encoded information about the color identity (Figure 4). This effect was due to neurons that had a large difference of firing rates between the color A and color B at load 1 (high PEV, i.e., information about color), and reduced differentiation at load 2 (small PEV, no or little information about color, e.g., Figure 2A). ‘Divisive-normalization-like regularization’ (DNR; Carandini and Heeger, 2011) can explain this effect. DNR describes the computation that takes place when two stimuli are presented simultaneously. In a simplified case a neuronal response becomes normalized, analogous to vector normalization, with a normalization factor consisting of the simultaneous stimuli (Carandini and Heeger, 2011). Applied to our context, a consequence of DNR would be a reduced differentiation between two color identities at load 2 because differences in firing rate (for each stimulus by itself) at load 1 would be normalized at load 2 (resulting in information loss, e.g., Figure 2A). We, therefore, hypothesized that DNR was observable for neurons with significant information at load 1. We tested for DNR in the NCL by calculating a selectivity index (SE) and a sensory interaction index (SI) for each neuron for the sample phase and the memory delay phase (Reynolds et al., 1999, see Materials and methods for details). SE indicates how strongly the neuronal response is driven by a color at the favorite location of the neuron (reference) in relation to a selected probe color (when either is presented alone). SI indicates how the probe color interacts with the reference color when both are presented simultaneously. Values of both indices, SE and SI, lie between –1 and +1. The addition of a probe color influences the response to the reference color by either suppressing the firing rate of the reference color (if the reference elicits a higher firing rate than the probe, i.e., SE <0), or increasing the firing rate for the reference color (if the probe elicits a higher firing rate than the reference, i.e., SE >0). If DNR was present, this influence to suppress or enhance neuronal responses should be an even mixture at the population level, resulting in a significant regression between SE and SI with a slope of around 0.5 (Bouchacourt and Buschman, 2019). We compared regressions for the sample and delay phase (each as one bin, see Materials and methods for details) for two groups of neurons: information-carrying neurons (significant information at load 1), and non-informative neurons (no information at load 1 or at load 2; Figure 5—figure supplement 1). We found that DNR was present in both sample and delay phases (Figure 5A). Information-carrying sample neurons had a fitted slope of 0.47 (R2adj 0.39, F(1,838) = 547.69, p < 0.001, CI = [0.43 0.51]) and delay neurons had a slope of 0.50 (R2adj 0.34, F(1,342) = 175.60, p < 0.001, CI = [0.43 0.58]). As the slopes were not significantly different from 0.5, this indicates that reference and probe color had an equal influence on neuronal responses. We thus show that DNR was observable in the neuronal population, and as a consequence of this computation, neurons had generally less information about the color identity at load 2.

Figure 5. Divisive normalization-like regularization was observable for neuronal responses of neurons losing information (A) but not for neurons gaining color information at load 2 (B).

Selectivity (SE) indicates how much the neuronal response is influenced by a color, relative to a second color when either is presented alone. Sensory interaction (SI) indicates how much the neuronal response is influenced by either color when both were displayed simultaneously. Slopes close to 0.5 indicate an equal influence of both colors. Slopes <0.5, or >0.5 indicate a weighted influence of a color. (A) Information-carrying neurons in the sample phase (as one bin; n = 105; left) and delay phase (as one bin; n = 43; right) population. (B) Information gaining neurons in the sample phase (as one bin; n = 56; left) and delay phase (as one bin; n = 8; right) population. The red line indicates the regression fit.

Figure 5.

Figure 5—figure supplement 1. Divisive normalization-like regularization was observable for neuronal responses of neurons without significant information.

Figure 5—figure supplement 1.

Both phases contain the same neurons (n = 171). Selectivity (SE) indicates how much the neuronal response is influenced by a color, relative to a second color when either is presented alone. Sensory interaction (SI) indicates how much the neuronal response is influenced by either color when both were displayed simultaneously. Slopes close to 0.5 indicate an equal influence of both colors. Slopes <0.5, or >0.5 indicate a weighted influence of a color. The red line indicates the regression fit. Non-informative sample neurons had a fitted slope of 0.38 (R2adj = 0.16, F(1,1366) = 258.08, p < 0.001), significantly smaller than 0.5 (CI = [0.33 0.42]). Delay neurons had a slope of 0.40, also significantly smaller than 0.5 (R2adj = 0.13, F(1,1366) = 197.51, p < .001, CI = [0.35 0.46]). This indicates that for these neurons the reference color influenced firing rate more than the probe color. This smaller slope is not related to the amount of information encoded for the individual colors (which determined the favorite location). It does however indicate that those non-informative neurons were influenced by any color at their favorite location and thereby might have been informative about if the favorite location had a color but not about what color.
Figure 5—figure supplement 2. Example for information gain due to unequal interaction at load 2.

Figure 5—figure supplement 2.

Notice how a lack of fire rate differentiation at load 1 (blue curves, left plot) turns into a differential firing rate at load 2 (purple curves, right plot). Depicted is one typical neuron. Black box insets depict the color and load condition during the sample (there were no colors present during the delay), and indicate which firing rate curve (mean ± SEM) the neuron had. Source Data File 1: This file contains all details of the reported statistics. The file consists of one MATLAB struct. The struct contains three major fields, corresponding to the different statistical analyses conducted in the Results section. ‘Behavior’ contains statistical results reported in the section ‘The WM capacity of crows is similar to that of monkeys’. ‘smpDlyEarlyLate’ contains statistical results reported in the section ‘The neuronal population has gradually less information with increasing load’. Divisive-normalization-like regularization (‘DNR’) contains statistical results reported in the sections ‘Gain of information at load 2 can be explained by neuronal normalization’ and ‘Higher loads produce divisive normalization-like neuronal responses’.

Gain of information at load 2 can be explained by neuronal normalization

Some neurons showed encoding of color identity at higher loads, instead of loss of information. These neurons were abundant in both the sample phase and the delay phase (Figure 3C). For example, the neuron shown in Figure 2B did not differentiate between color identities at load 1 but did so for load 2, thus, representing a case of information gain (instead of loss) at a higher load. We wanted to understand if DNR, the mechanism that we found reduced color information at load 2, could also produce color differentiation. The ‘normalization model of attention’ (Reynolds and Heeger, 2009) incorporates divisive normalization, and can explain how attention can modulate neuronal responses. By attending a preferred (or non-preferred) second colored square in the load 2 condition the neuronal response of a neuron to the target location (i.e., to color A and to color B at the favorite location) might be altered. As a result a difference between color A and B may arise even though each color by itself elicited a similar response. In other words, the interaction between the additional color and the target color is unequal. Neurons without a color differentiation at load 1 that gained differentiation at load 2 through this process (e.g., if the interaction of probe color A with reference color A is larger than the interaction of probe color A with reference color B, see Figure 5—figure supplement 2 for an example) should have a population regression slope smaller than 0.5. We thus hypothesized that the population of neurons showing information at load 2, but not at load 1 (e.g., Figure 2B), would have a smaller slope than the neurons that lost information (Figure 5A). Sample neurons had a slope of 0.19 (R2adj 0.05, F(1,446) = 23.0, p < 0.001, CI = [0.11 0.27], Figure 5B), and delay neurons had a slope of –0.04 (R2adj –0.015, F(1,62) = 0.08, p = 0.78, CI = [–0.29 0.22], Figure 5B). Both slopes were significantly smaller than 0.5 and smaller than the slopes of the non-informative neurons (Figure 5—figure supplement 1). This indicates that these neurons were influenced more strongly by the reference color, and that the addition of the probe color at load 2 resulted in an unequal interaction. Therefore, DNR was also computationally responsible for a gain of information at load 2, in a specific subset of neurons.

Discussion

Neuronal resources of WM capacity are hemifield independent and gradually allocated

Our results confirm behavioral findings that have been discussed in detail in an earlier study (Balakhonov and Rose, 2017). In brief, we found that the WM capacity of crows is limited to about four items, and that the two visual hemifields are largely independent (i.e., the number of items on one side does not affect change detection performance on the other side). Within each hemifield, performance dropped gradually with the addition of a second and third item but remained above chance. Fittingly, on the neuronal level, we found a markedly reduced amount of color information when the number of colored squares was increased from one to two (roughly 50% reduction in correct trials). This suggests that WM could be conceptualized as a continuous resource that has to be divided between the two items (Bays and Husain, 2008; van den Berg et al., 2012; Wilken and Ma, 2004), rather than two ‘simple’ slots that would each have the same amount of information irrespective of the memory load. This is also consistent with results of human neuroimaging that report decreased signal amplitude and precision with increasing memory load (Emrich et al., 2013; Sprague et al., 2014). In contrast, the hemispheric independence we observed would fit a slot-like model, in which the hemispheres as a whole act like discrete slots. A more nuanced version of the slot model (‘slots and averaging’; Zhang and Luck, 2008) could also account for graded amounts of information within a limited number of slots (Fukuda et al., 2010; Zhang and Luck, 2008), as we found here. The mix of discrete and independent hemispheres with a graded allocation of information between items that we found is comparable to results by Buschman et al., 2011, observed in monkey PFC. On the neuronal level, recurrent connections between neurons within a hemisphere may reduce item differentiation when multiple items are present simultaneously, creating capacity limitations within the hemisphere (Matsushima and Tanaka, 2014). A lack of interhemispheric recurrent connections would make processing in the other hemisphere independent. Like in monkeys, WM capacity in crows may therefore result from neuronal activity patterns governed by multiple individual items. We probed the WM capacity of crows using colored squares, based on the task design of Buschman et al., 2011. Using the identical task allowed us to directly compare our neuronal results of WM capacity from NCL to results from PFC of monkeys. Task similarity is very important for such cross species comparisons as even small changes in task parameters may introduce substantial differences in neuronal responses, leading to potentially different conclusions. In a task similar to the one used here, Lara and Wallis, 2014, have found that neurons in the PFC of monkeys encoded nearly no information about color, but instead about location. In their task monkeys had to memorize the color of squares at two locations on a screen, and were again confronted with a colored square at one of the two locations after a delay. The monkeys then had to indicate if the color at that location had changed. Lara and Wallis, 2014, discuss the absence of color information in the neurons they recorded in relation to the task of Buschman et al., 2011, who like us, did find color information. In brief, the exact task design may determine the neuronal encoding of task relevant information (Lara and Wallis, 2014). Similar to the complex contribution of PFC neurons to WM, neurons of NCL can also encode a wide range of very different task relevant aspects, like color (this study), spatial locations (Rinnert et al., 2019; Veit et al., 2017), and more abstract items like rules (Veit and Nieder, 2013) and numerosities (Ditz and Nieder, 2015).

Attentional processes guide WM allocation and maintenance

One way to circumvent WM failure when item load increases is to allocate attention. Our results suggest that attention may play an important role in crow WM. Capacity limitation became apparent during encoding, as the amount of information at the end of the sample period was affected by the stimulus load. Adding a second and third item to the ipsilateral stimulus array reduced the amount of color information encoded by NCL neurons that carried over into the memory delay. Furthermore, neuronal activity in trials in which the birds made an incorrect response showed only weak encoding during the sample phase without information maintenance during the memory delay. This fits studies of human WM that have shown attentive filtering during encoding of stimuli influencing WM capacity (Bays and Husain, 2008; Vogel et al., 2005; Vogel and Machizawa, 2004), and neuronal correlates of this have been reported for monkeys as well (Buschman et al., 2011). Beyond the domain of sensory signals, attention and WM may be directly linked. Neuronal correlates of WM and attention overlap in PFC neurons, for example, Lebedev et al., 2004, found that a substantial amount of PFC neurons encode either an attentional signal, or a memory signal, and some (hybrid) neurons do both. A purely mnemonic function of PFC thereby seems unlikely. Indeed, very recently, Panichello and Buschman, 2021, have reported that at the population level neurons of PFC encode ‘both the selection of items from working memory and attention to sensory inputs’ (p. 2), rather than just memory content. The independence of hemifields that we observed on the behavioral level (this study and Balakhonov and Rose, 2017) and found in the neuronal responses could also be related to attention. Adding stimuli in the contralateral hemifield affected neither performance nor information maintained by NCL neurons, whereas additional ipsilateral stimuli strongly reduced both. This fits the influence of attention on WM and hemifield independence, which is consistently accentuated in studies in which attention had to be divided between the two hemifields (Alvarez and Cavanagh, 2005; Buschman et al., 2011; Cavanagh and Alvarez, 2005; Delvenne, 2005; Delvenne et al., 2011). Finally, the DNR computation may explain the responses of the neurons that gained information at load 2 through attentional processes predicted by the ‘normalization model of attention’ (Reynolds and Heeger, 2009). This may appear counter-intuitive and contradictory, considering that the same process is also responsible for the loss of information. However, when attention is overtly directed to a specific (preferred or non-preferred) item within the receptive field of a neuron, the DNR computation shifts its weighting of the normalized response toward the response of the attended item (Reynolds et al., 1999; Reynolds and Heeger, 2009). This weighted normalization can produce a difference in the neuronal response to both color identities at load 2, even if the neuronal response was non-informative at load 1. At the population level we were able to observe such an effect as the reduced slope of the selectivity/interaction fit. Thus, an attentive process might have enhanced information in WM at higher loads. It is important to clarify that, as we did not use any form of attentional cueing in our study, we cannot explicitly test for such an attention effect. However, we do know that the animals participating in this study can use attentional cues to enhance their WM (Fongaro and Rose, 2020). The attention cues used by Fongaro and Rose, 2020, positively affected not only encoding but also the maintenance and retrieval of the information held in WM, comparable to results from monkeys and humans (Brady and Hampton, 2018; Souza and Oberauer, 2016). We, therefore, want to emphasize that our data is in line with the interpretation that the birds possibly attended a load 2 stimulus array differently than a load 1 stimulus array in order to enhance their performance in trials with higher loads.

Modern models of mammalian WM capacity are applicable to crows

Our neuronal recordings offer a mechanistic explanation for the behavioral effects, as we found clear evidence of DNR governing the neuronal responses tied to WM capacity that is in accordance with mammalian models of WM capacity (Bouchacourt and Buschman, 2019; Lundqvist et al., 2016; Lundqvist et al., 2011). The loss of information about color identity (i.e., neuronal response differentiation between colors) can be accounted for by DNR when an item is added to a neuron’s receptive field. The normalization of neuronal firing rate diminishes the differentiation between color identities. As such it is analogous to neurophysiological responses from visual areas (Carandini et al., 1997; Reynolds et al., 1999) and to the PFC during spatial WM (Matsushima and Tanaka, 2014). The WM model of Bouchacourt and Buschman, 2019, is based solely on data from monkey electrophysiology, and thus implicitly tied to the layered columns of the neocortex. The results we report here show that the model also fits the neurophysiology of WM in crows. However, the picture is incomplete since important aspects of monkeys’ WM are still not investigated in crows. Oscillations of local field potentials are relevant for how information enters WM and how it is maintained (Miller et al., 2018), and have been tied to normalization and competition between items in WM (Lundqvist et al., 2018b). Thus, the oscillatory interplay of the layers and different regions of the mammalian neocortex are important fields of research to further our understanding of WM. Such aspects are so far completely unknown in crows and their non-layered associative areas. This encourages further investigation into the neuronal circuits of WM in birds. There is also ongoing debate about the role of sustained activity during delay periods and how it relates to WM (Constantinidis et al., 2018; Lundqvist et al., 2018a; Miller et al., 2018). We cannot report of any neuron that showed persistent activity comparable to those reported by classical WM studies in PFC (e.g., Funahashi et al., 1989; Fuster and Alexander, 1971), or in NCL (Diekamp et al., 2002b; Veit et al., 2014; Veit and Nieder, 2013). This may be reconcilable with some other contemporary models of WM. One major type of those models implements ‘synfire chains’, where individual neurons fire sequentially (and transiently) to bridge temporal gaps and maintain task relevant contents (Rajan et al., 2016). This has, for example, been reported to be the case in posterior parietal cortex of mice performing a T-maze task that required WM for cued spatial locations to be maintained (Harvey et al., 2012). The transient activity of neurons that we report (Figures 2 and 3A) might fit into such models. However, our results can only be compared very cautiously to this (since even small changes in task design significantly alter neuronal responses, e.g., Lara and Wallis, 2014). Therefore, while we cannot, yet, fully equate crow and monkey WM, our results raise two important questions about how WM is implemented on the level of neuronal networks that have implications for our comparative view of crow WM. The first regards the neuronal computations underlying WM. Is there a common canonical computation governing WM, or are there different solutions based on different neuronal architectures? Recent work has shown that the sensory areas of mammals and birds show remarkably similar circuit organization (Stacho et al., 2020). However, higher-order associative areas involved in WM, like the LPFC in mammals and the NCL in birds, have distinctly different architectures (Stacho et al., 2020). The fact that differently organized areas like LPFC and NCL produce strikingly similar physiological responses points to shared computational principles. Modeling work already suggests that the competing WM capacity models can be accommodated into a unifying framework based on theoretical neuronal information sampling, where stochastic information sampling (assumed for continuous resource models) can account for item limitations better than fixed information sampling (assumed by the slots and averaging models) (Schneegans et al., 2020). Similarly, DNR is already considered to be a general, canonical computation of the nervous system, present in evolutionarily distant phyla, for example, fruit flies and monkeys (Carandini and Heeger, 2011). The second question regards the tradeoff between WM flexibility and capacity (Bouchacourt and Buschman, 2019). Is the WM of a crow as flexible as that of a monkey? Our results show that the computations by individual neurons that result in WM capacity limitations are virtually the same in crows and monkeys, highlighting a further aspect of WM that is similar between these animal groups (Nieder, 2017). Ultimately, our results were in line with different modern models of WM that implement DNR to explain capacity (Bouchacourt and Buschman, 2019; Lundqvist et al., 2016; Lundqvist et al., 2011; Schneegans et al., 2020). However, the data we presented cannot carry a definitive conclusion about which of the different models fits best. For example, a tradeoff between flexibility and capacity (Bouchacourt and Buschman, 2019) might be present, but further investigation into the models’ predictions is required. We do, however, show that mammalian models of WM are in line with WM in birds, which implies that fundamental aspects of WM are shared between these animal groups.

Conclusion

Together, all these facets of crow WM capacity suggest that the different intricate neuronal architectures that carry out the computations in monkeys and crows have likely been shaped by convergent evolution – into systems that yield similar cognitive performances. The systems may share the same basic mechanisms and thus limitations. Further investigation into the oscillatory dynamics of WM in the avian brain may elucidate if birds also share the prominent limitation of a tradeoff between flexibility and capacity.

Materials and methods

Subjects

Two hand-raised carrion crows (C. corone) of 2 years of age served as subjects in this study. The birds were housed in spacious aviaries in social groups. During the experimental procedures, the animals were held on a controlled food protocol with ad libitum access to water and grit. All experimental procedures and housing conditions were carried out in accordance with the National Institutes of Health Guide for Care and Use of Laboratory Animals and were authorized by the national authority (LANUV).

Experimental setup

We used operant training chambers (50 × 50.5 × 77.5 cm3, width × depth × height) equipped with an acoustic pulse touchscreen (22’’, ELO 2200L APR, Elo Touch Solutions Inc, Milpitas, CA) and an infrared camera (Sygonix, Nürnberg, Germany) for remote monitoring. The birds sat on a wooden perch so that the distance between the bird’s eye and the touchscreen was 8 cm. Food pellets were delivered as a reward via a custom-made automatic feeder (plans available at http://www.jonasrose.net/). The position of the animal’s head was tracked online during the experiment by two open-source computer vision cameras (‘Pixy’, CMUcam5, Charmed Labs, Austin, TX) that reported the location and angle between two LEDs. For tracking, we surgically implanted a lightweight head-post and used a lightweight 3D-printed mount with LEDs that was removed after each experimental session. The system reported the head location at a frame rate of 50 Hz and data was smoothed by integrating over two frames in MATLAB using custom programs on a control PC. All experiments were controlled by custom programs in MATLAB using the Biopsychology (Rose et al., 2008) and Psychophysics toolboxes (Brainard, 1997). Digital input and output of the control PC were handled by a microcontroller (ODROID C1, Hardkernel co. Ltd, Anyang, South Korea) connected through a gigabit network running custom software (available at http://www.jonasrose.net/).

Behavioral protocol

The behavioral protocol was identical to the one described in Balakhonov and Rose, 2017. We trained the birds to perform a delayed change localization task that had previously been used to test the performance under different WM loads in primates (Buschman et al., 2011). Each trial started after a 2 s inter-trial-interval, with the presentation of a red dot centered on the touchscreen (for a maximum of 40 s). The animals initiated the trial by centering their head in front of the red dot for 160 ms. This caused the red dot to disappear and a stimulus array of two to five colored squares to appear (Figure 1A, ‘sample’). The colored squares were presented for a period of 800 ms, during which the animals had to hold their head still and center their gaze on the screen (‘hold gaze’, no more than 2 cm horizontal or vertical displacement, and no more than 20 degrees horizontal or vertical rotation). Failure to hold the head in this position resulted in an aborted trial. This sample phase was followed by a memory delay of 1000 ms after which the stimulus array reappeared with one color exchanged. The animal had to indicate the location of the color change by pecking the respective square. Correct responses were rewarded probabilistically (BEO special pellets, in 55% of correct trials, additional 2 s illumination of the food receptacle in 100% of correct trials). Incorrect responses to colors that had not changed or a failure to respond within 4 s resulted in a brief screen flash and a 10 s timeout. The stimuli were presented at six fixed locations on the screen (1–6, Figure 1A). In each session, one pair of colors was assigned to each of the six locations. Each location had its own distinct pair. These pairs were randomly chosen from a pool of 14 colors (two color combinations were excluded since the animals did not discriminate them equally well during a pre-training). Let us consider Figure 1A as an example. The color change occurs in the middle-left where turquois (T) is presented during the sample and orange (O) during the choice. In this particular session the middle-left could thus show either of the following colors during the sample and choice: T-O (shown in Figure 1A); O-T; O-O; T-T; None-None. On the next session a new random pair of colors was displayed at this location.

For identification and analysis an arbitrary label was assigned to each of the randomly drawn colors at the start of each session (i.e., left middle location: ‘color A’, or ‘color B’; left top location: ‘color A’, or ‘color B’; etc.). These indices do not refer to the order of presentation of the colors at any time, but were held constant for neuronal analysis. The order of presentation of colors within a pair, the target location (where the color change occurred), and the number of stimuli in the array (two to five) were randomized and balanced across trials so that each condition had an equal likelihood to appear. The order of presentation of colors within a pair, the target location (where the color change occurred), and the number of stimuli in the array (two to five) were randomized and balanced across trials so that each condition had an equal likelihood to appear. The color squares had a width of 10 degrees of visual angle (DVA) and were placed on the horizontal meridian of the screen and at 45.8 DVA above or below the meridian at a distance of 54 and 55.4 DVA from the center. This arrangement in combination with the head tracking ensured that all stimuli appeared outside of the binocular visual field of crows (37.6 DVA; Troscianko et al., 2012).

Surgery

Both animals were chronically implanted with a lightweight head-post to attach a small LED holder during the experiments. Before surgery, animals were deeply anesthetized with ketamine (50 mg/kg) and xylazine (5 mg/kg). Once deeply anesthetized, animals were placed in a stereotaxic frame. After attaching the small head-post with dental acrylic, a microdrive with a multi-channel microelectrode was stereotactically implanted at the craniotomy (Neuronexus Technologies Inc, Ann Arbor MI, DDrive). The electrode was positioned in NCL (AP 5.0, ML 13.0) of the left hemisphere (coordinates for the region based on histological studies on the localization of NCL in crows; Veit and Nieder, 2013). After the surgery, the crows received analgesics.

Electrophysiological recordings

Extracellular single neuron recordings were performed using chronically implanted multi-channel microelectrodes. The distance between recording sites was 50 µm. The signal was amplified, filtered, and digitized using Intan RHD2000 headstages and a USB-Interface board (Intan Technologies LLC, Los Angeles, CA). The system also recorded digital event codes that were sent from the behavioral control PC using a custom IO device (details available at http://www.jonasrose.net/). Before each recording session, the electrodes were advanced manually using the microdrive. Recordings were started 20 min after the advancement, and each recording site was manually checked for neuronal signals. The signals were recorded at a sampling rate of 30 kHz and filtered with a band-pass filter at recording (0.5–7.5 kHz). The recorded neuronal signals were not pre-selected for task involvement. We performed spike sorting using the semi-automatic Klusta-suite software (Rossant et al., 2016), which uses the high electrode count and their close spacing to isolate signals of single neurons. For spike sorting, we filtered with a high pass of 500 Hz and a low pass of 7125 Hz. The software utilizes the spatial distribution of the recorded signal along the different recording sites to untangle overlapping signals and separate signals with similar waveforms but different recording depths.

Data analysis

All statistical analyses were performed in MATLAB (2018b, Mathworks Inc) using commercially available toolboxes (Curve Fitting Toolbox Version 3.5.3, Statistics and Machine Learning Toolbox Version 10.2) and custom code. For all statistical tests, we assumed a significance level of α = 0.05, unless stated otherwise. Trials were classified as error trials if the bird chose a location where no change of colors had appeared. Trials in which the bird did not choose any location or failed to maintain head fixation were not analyzed. All correct trials were included in the analysis of neural data. Depending on the analysis we refer to different ‘load conditions’ relative to referential sides of the screen which are either ipsilateral (same side) or contralateral (opposite side), each with a possible load between one and three items. For the behavioral analyses the terms ipsilateral and contralateral refer to the location of the change that had to be detected. For the neurophysiological analyses the terms ipsilateral and contralateral refer to the respective neuron’s favorite location (described in the following section).

Because there were only very few error trials in the load one condition, we performed error trial analysis only for the load 2 and load 3 conditions. The behavioral data were analyzed as described in our previous study (Balakhonov and Rose, 2017), estimating the WM capacity K for each load by Equation 1.

K=n*p (1)

where p is the percentage correct and n is the number of items in WM. This estimate has been applied to similar primate data and in studies with humans (Johnson et al., 2013; Kornblith et al., 2016).

Information about color Identity

Based on a one-way ANOVA of color identity at a given location, we calculated a PEV statistic to measure the effect size of neuronal modulation. Its main parameter ω² is a measurement for the percentage to which the tested factor can explain the variance of the data, and it is calculated from the sum of squares of the effect (SSeffect) and the mean squares of the within-group (error) variance (MSerror) (Equation 2).

ω2=SSeffect-df*MSerrorSStotal+MSerror (2)

For each neuron, we determined a ‘favorite location’, which was defined as the location with the highest cumulative PEV, of the three possible locations on the right half of the screen, that is, opposite to the implanted hemisphere, across four non-overlapping bins during the sample phase (bin size 200 ms, advanced in steps of 200 ms, from start till the end of the sample phase). The significance of calculated effect size values was determined by a permutation test. We ran the permutation to calculate the likelihood of getting an explained variance value bigger than the one calculated from the actual distribution of the data by randomly permuting the color identity labels and calculating the PEV 1000 times. The test thereby does not assume any distribution of the data and returns an unbiased estimate of the likelihood of generating an effect size within the data randomly. The measured value of explained variance from the actual dataset was assumed to be significant if the likelihood of randomly generating a bigger value was below 5%. We chose to not correct for the multiple comparisons at this level, as we reasoned that if we were to only include those neurons that had the most information at the individual loads (i.e., those with the highest PEV values that are significant even under very stringent statistical criteria) we would have artificially inflated the amount of information present at each load. By including neurons that encoded less information too (i.e., at the uncorrected p-value) our analysis population was more resistant to such outlier effects. We additionally performed our analyses using a more stringent statistical criterion (significance of two consecutive non-overlapping bins) and found the same results (Figure 4—figure supplement 2). We tested the proportions of significant neurons we found for the different trial phases by performing a binomial test, assuming a significance level α = 0.05 (Equation 3).

PX=i=Bp0,n=nkp0i1-p0n-i (3)

Calculating the probability p, of finding X significant neurons, given a total amount of i (362) neurons, and a probability p0 of 5% finding a significance by chance.

Population analyses

We considered neuronal significance (i.e., significant PEV as determined above) for each load independently. This means, we tested if the PEV of a neuron was significant three times with the permutation method described above: once for each of the three load conditions. Therefore, we can report seven groups of significance (Table 1, Figure 3C). Subsequently, we created three pooled groups (Table 1) from all neurons with a significant PEV at each individual load. We used these pooled groups for the population analyses (Figures 4 and 5). Neurons of these pooled groups, with a significant PEV during the sample phase were assigned to the ‘sample population’, and neurons with a significant amount of information during the memory-delay phase were assigned to the ‘delay population’ (significance criterion: one significant 200 ms bin, at α = 0.05, see above for our reasoning not to correct for multiple comparison at this point). Thus, neurons with significant PEV during both the sample and delay phase were included in both subpopulations. We corrected for the unequal amount of correct and error trials when comparing information about color (PEV) between the trial conditions, by sub-sampling correct trials with the number of error trials 1000 times for each neuron. The resulting PEV values of correct trials were then averaged for each neuron, this population of averaged PEV values was then statistically tested against the PEV values of error trials (of the same neurons) using a dependent t-test.

Divisive normalization-like regularization

We tested for the presence of divisive normalization using the method of Reynolds et al., 1999. Three conditions were considered: (1) neuronal response to stimulus A, (2) neuronal response to stimulus B, and (3) neuronal response to the simultaneity of stimuli A and B. As we wanted to relate this to the information about color identity, we selected subsets of the favorite location and the additional two ipsilateral locations. To test how the neurons altered their response when multiple stimuli were presented simultaneously, we calculated the color selectivity index (SE) and the sensory interaction index (SI) of each neuron. SEi was calculated by subtracting the normalized firing rate for the chosen reference color i (REFi) at the neuron’s favorite location, from a second color j (PROBEj) at a different location (ipsilateral to the favorite location, Equation 4).

SEi=PROBEj-REFi (4)

The resulting selectivity index lies between –1 (completely selective for the reference color) and 1 (completely selective for the probe color). SI was calculated (Equation 5) by subtracting the normalized firing rate for REFi from the normalized firing rate of the combination of REFi and PROBEj (PAIRi,j).

SIi,j=PAIRi,j-REFi (5)

This interaction index also lies between –1 (full suppression of reference stimulus by the probe stimulus) and 1 (full enhancement of the reference stimulus by the probe stimulus). As each of the three locations had two possible colors, we calculated eight SE and SI indices per neuron and performed a linear regression for all indices. This is required as each stimulus combination is informative about the normalization. The effects of divisive normalization were compared between the sample and the delay phase. Therefore, SE and SI indices were calculated across the entire sample (800 ms) and memory delay (1000 ms) phase. Neurons with significant information were accordingly identified over the entire sample and delay as one bin, using the permutation test described in the section ‘information about color identity’. We considered the entire sample and delay phase because we wanted to analyze the population response as a whole, irrespective of highly diverse response profiles of individual neurons.

Hierarchical clustering

To visualize the different groups of neurons that encoded and maintained information about the color identity during different phases of the trial, we performed a hierarchical clustering analysis in MATLAB on the normalized PEV values of individual neurons throughout the trial. We used a (1 − correlation) distance metric and an average distance linkage function for a maximum of seven clusters. The maximum number of clusters was first determined by calculating the clustering for different amounts of clusters (1–10) and subsequently calculating the within-cluster sum-of-squares. This resulted in a graph that allowed us to visually inspect the tradeoff between cluster number and fit improvement, from which we estimated the inflection point (elbow method). A cluster number of seven presented the best tradeoff that allowed visualization of the different groups at an acceptable clustering success. We then ordered the neuron clusters to minimize the average distance between the clusters in the dendrogram.

Acknowledgements

We would like to thank Mikael Lundqvist for helpful comments on an earlier version of the manuscript.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Lukas Alexander Hahn, Email: lukas.hahn@ruhr-uni-bochum.de.

Dmitry Balakhonov, Email: balakhonov.ds@gmail.com.

Erin L Rich, Icahn School of Medicine at Mount Sinai, United States.

Michael J Frank, Brown University, United States.

Funding Information

This paper was supported by the following grants:

  • Volkswagen Foundation Freigeist Fellowship 93299 to Jonas Rose.

  • Deutsche Forschungsgemeinschaft Project B13 of the collaborative research center 874 (122679504) to Jonas Rose.

Additional information

Competing interests

No competing interests declared.

Author contributions

Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing - original draft, Writing - review and editing.

Conceptualization, Data curation, Methodology.

Data curation.

Project administration, Resources, Writing - review and editing.

Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Visualization, Writing - review and editing.

Ethics

All experimental procedures and housing conditions were carried out in accordance with the National Institutes of Health Guide for Care and Use of Laboratory Animals and were authorized by the national authority (LANUV protocol no. 84-02.04.2017.A001).

Additional files

Transparent reporting form
Source data 1. Reported statistical results and numerical values.
elife-72783-supp1.zip (204.8KB, zip)

Data availability

All details of statistics reported in the manuscript is provided as a supporting file. Source data files of all figures are publicly available via dryad https://doi.org/10.5061/dryad.0k6djhb1q.

The following dataset was generated:

Hahn LA, Balakhonov D, Rose J. 2021. Working memory capacity of crows and monkeys arises from similar neuronal computations. Dryad Digital Repository.

References

  1. Alvarez GA, Cavanagh P. Independent resources for attentional tracking in the left and right visual hemifields. Psychological Science. 2005;16:637–643. doi: 10.1111/j.1467-9280.2005.01587.x. [DOI] [PubMed] [Google Scholar]
  2. Awh E, Barton B, Vogel EK. Visual working memory represents a fixed number of items regardless of complexity. Psychological Science. 2007;18:622–628. doi: 10.1111/j.1467-9280.2007.01949.x. [DOI] [PubMed] [Google Scholar]
  3. Balakhonov D, Rose J. Crows Rival Monkeys in Cognitive Capacity. Scientific Reports. 2017;7:8809. doi: 10.1038/s41598-017-09400-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bays PM, Husain M. Dynamic shifts of limited working memory resources in human vision. Science. 2008;321:851–854. doi: 10.1126/science.1158023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bouchacourt F, Buschman TJ. A Flexible Model of Working Memory. Neuron. 2019;103:147–160. doi: 10.1016/j.neuron.2019.04.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brady RJ, Hampton RR. Post-encoding control of working memory enhances processing of relevant information in rhesus monkeys (Macaca mulatta) Cognition. 2018;175:26–35. doi: 10.1016/j.cognition.2018.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Brainard DH. The Psychophysics Toolbox. Spatial Vision. 1997;10:433–436. doi: 10.1163/156856897X00357. [DOI] [PubMed] [Google Scholar]
  8. Buschman TJ, Siegel M, Roy JE, Miller EK. Neural substrates of cognitive capacity limitations. PNAS. 2011;108:11252–11255. doi: 10.1073/pnas.1104666108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Carandini M, Heeger DJ, Movshon JA. Linearity and normalization in simple cells of the macaque primary visual cortex. The Journal of Neuroscience. 1997;17:8621–8644. doi: 10.1523/JNEUROSCI.17-21-08621.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Carandini M, Heeger DJ. Normalization as a canonical neural computation. Nature Reviews. Neuroscience. 2011;13:51–62. doi: 10.1038/nrn3136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cavanagh P, Alvarez GA. Tracking multiple targets with multifocal attention. Trends in Cognitive Sciences. 2005;9:349–354. doi: 10.1016/j.tics.2005.05.009. [DOI] [PubMed] [Google Scholar]
  12. Constantinidis C, Funahashi S, Lee D, Murray JD, Qi XL, Wang M, Arnsten AFT. Persistent Spiking Activity Underlies Working Memory. The Journal of Neuroscience. 2018;38:7020–7028. doi: 10.1523/JNEUROSCI.2486-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cowan N. The magical number 4 in short-term memory: a reconsideration of mental storage capacity. The Behavioral and Brain Sciences. 2001;24:87–114. doi: 10.1017/s0140525x01003922. [DOI] [PubMed] [Google Scholar]
  14. Cowan N. The many faces of working memory and short-term storage. Psychonomic Bulletin & Review. 2017;24:1158–1170. doi: 10.3758/s13423-016-1191-6. [DOI] [PubMed] [Google Scholar]
  15. Delvenne JF. The capacity of visual short-term memory within and between hemifields. Cognition. 2005;96:B79–B88. doi: 10.1016/j.cognition.2004.12.007. [DOI] [PubMed] [Google Scholar]
  16. Delvenne JF, Kaddour LA, Castronovo J. An electrophysiological measure of visual short-term memory capacity within and across hemifields. Psychophysiology. 2011;48:333–336. doi: 10.1111/j.1469-8986.2010.01079.x. [DOI] [PubMed] [Google Scholar]
  17. Diekamp B, Gagliardo A, Güntürkün O. Nonspatial and subdivision-specific working memory deficits after selective lesions of the avian prefrontal cortex. The Journal of Neuroscience. 2002a;22:9573–9580. doi: 10.1523/JNEUROSCI.22-21-09573.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Diekamp B, Kalt T, Güntürkün O. Working memory neurons in pigeons. The Journal of Neuroscience. 2002b;22:RC210. doi: 10.1523/JNEUROSCI.22-04-j0002.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Ditz HM, Nieder A. Neurons selective to the number of visual items in the corvid songbird endbrain. PNAS. 2015;112:7827–7832. doi: 10.1073/pnas.1504245112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Emrich SM, Riggall AC, Larocque JJ, Postle BR. Distributed patterns of activity in sensory cortex reflect the precision of multiple items maintained in visual short-term memory. The Journal of Neuroscience. 2013;33:6516–6523. doi: 10.1523/JNEUROSCI.5732-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Fongaro E, Rose J. Crows control working memory before and after stimulus encoding. Scientific Reports. 2020;10:1–10. doi: 10.1038/s41598-020-59975-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Fukuda K, Awh E, Vogel EK. Discrete capacity limits in visual working memory. Current Opinion in Neurobiology. 2010;20:177–182. doi: 10.1016/j.conb.2010.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Funahashi S, Bruce CJ, Goldman-Rakic PS. Mnemonic coding of visual space in the monkey’s dorsolateral prefrontal cortex. Journal of Neurophysiology. 1989;61:331–349. doi: 10.1152/jn.1989.61.2.331. [DOI] [PubMed] [Google Scholar]
  24. Fuster JM, Alexander GE. Neuron activity related to short-term memory. Science. 1971;173:652–654. doi: 10.1126/science.173.3997.652. [DOI] [PubMed] [Google Scholar]
  25. Güntürkün O, Bugnyar T. Cognition without Cortex. Trends in Cognitive Sciences. 2016;20:291–303. doi: 10.1016/j.tics.2016.02.001. [DOI] [PubMed] [Google Scholar]
  26. Hartmann K, Veit L, Nieder A. Neurons in the crow nidopallium caudolaterale encode varying durations of visual working memory periods. Experimental Brain Research. 2018;236:215–226. doi: 10.1007/s00221-017-5120-3. [DOI] [PubMed] [Google Scholar]
  27. Harvey CD, Coen P, Tank DW. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature. 2012;484:62–68. doi: 10.1038/nature10918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Heeger DJ. Normalization of cell responses in cat striate cortex. Visual Neuroscience. 1992;9:181–197. doi: 10.1017/s0952523800009640. [DOI] [PubMed] [Google Scholar]
  29. Husband S, Shimizu T. Evolution of the Avian Visual System, in: Avian Visual Cognition [On-Line] 2001. [August 2, 2021]. http://www.pigeon.psy.tufts.edu/avc/husband/
  30. Johnson MK, McMahon RP, Robinson BM, Harvey AN, Hahn B, Leonard CJ, Luck SJ, Gold JM. The relationship between working memory capacity and broad measures of cognitive ability in healthy adults and people with schizophrenia. Neuropsychology. 2013;27:220–229. doi: 10.1037/a0032060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kornblith S, Buschman TJ, Miller EK. Stimulus Load and Oscillatory Activity in Higher Cortex. Cerebral Cortex. 2016;26:3772–3784. doi: 10.1093/cercor/bhv182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kröner S, Güntürkün O. Afferent and efferent connections of the caudolateral neostriatum in the pigeon (Columba livia): a retro- and anterograde pathway tracing study. The Journal of Comparative Neurology. 1999;407:228–260. doi: 10.1002/(sici)1096-9861(19990503)407:2&#x0003c;228::aid-cne6&#x0003e;3.0.co;2-2. [DOI] [PubMed] [Google Scholar]
  33. Lara AH, Wallis JD. Executive control processes underlying multi-item working memory. Nature Neuroscience. 2014;17:876–883. doi: 10.1038/nn.3702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Lebedev MA, Messinger A, Kralik JD, Wise SP. Representation of attended versus remembered locations in prefrontal cortex. PLOS Biology. 2004;2:e365. doi: 10.1371/journal.pbio.0020365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Luck SJ, Vogel EK. The capacity of visual working memory for features and conjunctions. Nature. 1997;390:279–281. doi: 10.1038/36846. [DOI] [PubMed] [Google Scholar]
  36. Lundqvist M, Herman P, Lansner A. Theta and gamma power increases and alpha/beta power decreases with memory load in an attractor network model. Journal of Cognitive Neuroscience. 2011;23:3008–3020. doi: 10.1162/jocn_a_00029. [DOI] [PubMed] [Google Scholar]
  37. Lundqvist M, Rose J, Herman P, Brincat SL, Buschman TJ, Miller EK. Gamma and Beta Bursts Underlie Working Memory. Neuron. 2016;90:152–164. doi: 10.1016/j.neuron.2016.02.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Lundqvist M, Herman P, Miller EK. Working Memory: Delay Activity, Yes! Persistent Activity? Maybe Not. The Journal of Neuroscience. 2018a;38:7013–7019. doi: 10.1523/JNEUROSCI.2485-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Lundqvist M, Herman P, Warden MR, Brincat SL, Miller EK. Gamma and beta bursts during working memory readout suggest roles in its volitional control. Nature Communications. 2018b;9:394. doi: 10.1038/s41467-017-02791-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Ma WJ, Husain M, Bays PM. Changing concepts of working memory. Nature Neuroscience. 2014;17:347–356. doi: 10.1038/nn.3655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Matsushima A, Tanaka M. Different neuronal computations of spatial working memory for multiple locations within versus across visual hemifields. The Journal of Neuroscience. 2014;34:5621–5626. doi: 10.1523/JNEUROSCI.0295-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Miller EK, Lundqvist M, Bastos AM. Working Memory 2.0. Neuron. 2018;100:463–475. doi: 10.1016/j.neuron.2018.09.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Nieder A. Inside the corvid brain—probing the physiology of cognition in crows. Current Opinion in Behavioral Sciences. 2017;16:8–14. doi: 10.1016/j.cobeha.2017.02.005. [DOI] [Google Scholar]
  44. Oberauer K, Lewandowsky S, Awh E, Brown GDA, Conway A, Cowan N, Donkin C, Farrell S, Hitch GJ, Hurlstone MJ, Ma WJ, Morey CC, Nee DE, Schweppe J, Vergauwe E, Ward G. Benchmarks for models of short-term and working memory. Psychological Bulletin. 2018;144:885–958. doi: 10.1037/bul0000153. [DOI] [PubMed] [Google Scholar]
  45. Panichello MF, Buschman TJ. Shared mechanisms underlie the control of working memory and attention. Nature. 2021;592:601–605. doi: 10.1038/s41586-021-03390-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Rajan K, Harvey CD, Tank DW. Recurrent Network Models of Sequence Generation and Memory. Neuron. 2016;90:128–142. doi: 10.1016/j.neuron.2016.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Reynolds JH, Chelazzi L, Desimone R. Competitive mechanisms subserve attention in macaque areas V2 and V4. The Journal of Neuroscience. 1999;19:1736–1753. doi: 10.1523/JNEUROSCI.19-05-01736.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Reynolds JH, Heeger DJ. The normalization model of attention. Neuron. 2009;61:168–185. doi: 10.1016/j.neuron.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Rinnert P, Kirschhock ME, Nieder A. Neuronal Correlates of Spatial Working Memory in the Endbrain of Crows. Current Biology. 2019;29:2616–2624. doi: 10.1016/j.cub.2019.06.060. [DOI] [PubMed] [Google Scholar]
  50. Rose J, Colombo M. Neural correlates of executive control in the avian brain. PLOS Biology. 2005;3:e0030190. doi: 10.1371/journal.pbio.0030190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Rose J, Otto T, Dittrich L. The Biopsychology-Toolbox: a free, open-source Matlab-toolbox for the control of behavioral experiments. Journal of Neuroscience Methods. 2008;175:104–107. doi: 10.1016/j.jneumeth.2008.08.006. [DOI] [PubMed] [Google Scholar]
  52. Rossant C, Kadir SN, Goodman DFM, Schulman J, Hunter MLD, Saleem AB, Grosmark A, Belluscio M, Denfield GH, Ecker AS, Tolias AS, Solomon S, Buzsaki G, Carandini M, Harris KD. Spike sorting for large, dense electrode arrays. Nature Neuroscience. 2016;19:634–641. doi: 10.1038/nn.4268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Schneegans S, Taylor R, Bays PM. Stochastic sampling provides a unifying account of visual working memory limits. PNAS. 2020;117:20959–20968. doi: 10.1073/pnas.2004306117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Souza AS, Oberauer K. In search of the focus of attention in working memory: 13 years of the retro-cue effect. Attention, Perception & Psychophysics. 2016;78:1839–1860. doi: 10.3758/s13414-016-1108-5. [DOI] [PubMed] [Google Scholar]
  55. Sprague TC, Ester EF, Serences JT. Reconstructions of information in visual spatial working memory degrade with memory load. Current Biology. 2014;24:2174–2180. doi: 10.1016/j.cub.2014.07.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Stacho M, Herold C, Rook N, Wagner H, Axer M, Amunts K, Güntürkün O. A cortex-like canonical circuit in the avian forebrain. Science. 2020;369:eabc5534. doi: 10.1126/science.abc5534. [DOI] [PubMed] [Google Scholar]
  57. Troscianko J, von Bayern AMP, Chappell J, Rutz C, Martin GR. Extreme binocular vision and a straight bill facilitate tool use in New Caledonian crows. Nature Communications. 2012;3:1110. doi: 10.1038/ncomms2111. [DOI] [PubMed] [Google Scholar]
  58. van den Berg R, Shin H, Chou WC, George R, Ma WJ. Variability in encoding precision accounts for visual short-term memory limitations. PNAS. 2012;109:8780–8785. doi: 10.1073/pnas.1117465109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Veit L, Nieder A. Abstract rule neurons in the endbrain support intelligent behaviour in corvid songbirds. Nature Communications. 2013;4:11. doi: 10.1038/ncomms3878. [DOI] [PubMed] [Google Scholar]
  60. Veit L, Hartmann K, Nieder A. Neuronal correlates of visual working memory in the corvid endbrain. The Journal of Neuroscience. 2014;34:7778–7786. doi: 10.1523/JNEUROSCI.0612-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Veit L, Hartmann K, Nieder A. Spatially Tuned Neurons in Corvid Nidopallium Caudolaterale Signal Target Position During Visual Search. Cerebral Cortex. 2017;27:1103–1112. doi: 10.1093/cercor/bhv299. [DOI] [PubMed] [Google Scholar]
  62. Vogel EK, Machizawa MG. Neural activity predicts individual differences in visual working memory capacity. Nature. 2004;428:748–751. doi: 10.1038/nature02447. [DOI] [PubMed] [Google Scholar]
  63. Vogel EK, McCollough AW, Machizawa MG. Neural measures reveal individual differences in controlling access to working memory. Nature. 2005;438:500–503. doi: 10.1038/nature04171. [DOI] [PubMed] [Google Scholar]
  64. Waldmann C, Güntürkün O. The dopaminergic innervation of the pigeon caudolateral forebrain: immunocytochemical evidence for a “prefrontal cortex” in birds? Brain Research. 1993;600:225–234. doi: 10.1016/0006-8993(93)91377-5. [DOI] [PubMed] [Google Scholar]
  65. Wilken P, Ma WJ. A detection theory account of change detection. Journal of Vision. 2004;4:1120–1135. doi: 10.1167/4.12.11. [DOI] [PubMed] [Google Scholar]
  66. Zhang W, Luck SJ. Discrete fixed-resolution representations in visual working memory. Nature. 2008;453:233–235. doi: 10.1038/nature06860. [DOI] [PMC free article] [PubMed] [Google Scholar]

Editor's evaluation

Erin L Rich 1

In this study, Hahn et al., taught crows to perform a multi-item working memory task designed to mimic traditional monkey tasks. Using a combination of behavior and electrophysiology, the authors convincingly show that the neural mechanisms that limit working memory capacity in primates also limit working memory capacity in crows. Such cross-species comparisons are fundamental to understanding the computational constraints that are placed on cognition and the brain.

Decision letter

Editor: Erin L Rich1
Reviewed by: Mike Colombo

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Working memory capacity of crows and monkeys arises from similar neuronal computations" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Michael Frank as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Mike Colombo (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

The reviewers were each positive about the research and the paper. In reviews and discussion, a number of issues were raised which the authors should address in their revised manuscript. Below are the most important points, which are considered essential revisions. The individual reviewer comments with additional detail are appended to this list for the authors' reference.

1. Load-dependent increases in color tuning seem counter to many predictions, and the authors should clarify how the DNR model can explain an increase in information as the number of stimuli increase. The suggestion appears to be that, if neurons carry more information for load 2 trials compared to load 1 trials, then this reflects an interaction between the two items that increases the information signaled, but how exactly this might occur isn't clear. There was also confusion about how non-linear effects on neural responses, which are common in the literature, relate to the DNR model. The reviewers felt that a slope near 0 is not sufficient evidence to argue conclusively that the results can be explained by DNR. It would be helpful to consider other evidence in the data and/or other normalization mechanisms, and provide a specific example neuron that shows an increase in information, along with its response to the sample, reference, and paired stimuli.

2. It should be more clearly stated when working memory loads refer to the number of ipsilateral items on the display, particularly in the section on single unit analyses and in Figure 1.

3. More clarity on the task design, terminology, and figures is needed. Specifically, what it means that color pairs are "fixed" to some of the locations, and how colors are designated "color 1" and "color 2" and whether these designations are constant across loads 1 – 3 (Figure 2) should be clarified. Perhaps an example would help here. In addition, more information on the perceptual similarity (to crows) of the colors used, and how this did or did not impact performance is important. In Figures 1 and 2, the labels "hold gaze" and "sample" appear to be used interchangeably, leading to some confusion, and consistent terms would be helpful.

4. A major strength of the paper is cross-species comparisons, and the reviewers suggest that the discussion of these comparisons be expanded beyond monkey neurophysiology. For instance, how do the current findings relate to sequential activity commonly reported in non-primate mammals (versus the more stable 'attractor' dynamics in primate models of working memory)? How might this work relate to behavior and neuroimaging in humans that is suggestive of DNR?

5. The statistical corrections for multiple comparisons across time and/or groups should be clarified, particularly in the single unit analyses.

6. How do the authors reconcile the fact that the number of single neurons tuned to color tends to increase at higher loads, but there is a loss of color information at the population level?

7. Reviewers requested more detail on whether any neurons in the population sustained information across the delay, since it is relevant to the ongoing debate about sustained vs. transient working memory representations in non-human primates.

8. Neurons in Figure 3 seem to respond during the delay period in time windows of ~250 ms. Is this intrinsic to the neural response or is it related to the 200 ms smoothing window that was used? In other words, would a similar pattern be observed if smaller smoothing windows were used?

9. The authors argue that the existence of information on error trials, even on high load trials, suggests that memory is not an 'all-or-none' phenomena, and this is inconsistent with the slot models of working memory. However, it seems like this was only true for the sample period, and there was no information during the memory delay (Figure 3B). Could this be interpreted as an inability for a sample stimulus to be sustained during memory – in other words, that it didn't make it into a 'slot'?

10. While the results are conceptually consistent with classic effects in monkey lateral PFC, Lara and Wallis (Nat Neurosci, 2014) reported a near absence of color tuning in a very similar color change detection task, and instead found predominantly spatial attention signals. Can the authors discuss this discrepancy?

11. Interference between memory representation is mentioned as a source of information loss, and a 2-sec ITI likely generates a considerable amount of proactive interference. Thus is it possible that the neural outcomes are being driven by high levels of proactive interference generated by the current design?

Reviewer #1 (Recommendations for the authors):

1. While the results are conceptually consistent with classic effects in monkey lateral PFC, Lara and Wallis (Nat Neurosci, 2014) reported a near absence of color tuning in a very similar color change detection task, and instead found predominantly spatial attention signals. Can the authors discuss this discrepancy?

2. Is there much known about color vision in crows? Specifically, are there any subsets of colors used in this study that could have appeared more perceptually similar/different to the crows, and does that affect performance?

3. It wasn't immediately obvious whether the single unit analyses that looked at effects of working memory load were based on ipsilateral, or total load. Based on the behavioral results, it seems that it should be ipsilateral only, but can the authors clarify this?

4. How do the authors reconcile the fact that the number of single neurons tuned to color tends to increase at higher loads, but there is a loss of color information at the population level?

5. Load-dependent increases in color tuning also seem counter to many predictions. Are there any similar results in the NHP literature? Also, I wasn't able to follow how divisive normalization results in gain of information with increasing load. Can the authors elaborate on this?

6. Is it predicted that neurons without clear color tuning also exhibited load effects consistent with divisive normalization (Supp Figure 7)? If not, how is this result interpreted?

Reviewer #2 (Recommendations for the authors):

General Comments

1. I think my biggest concern is regarding the argument for divisive normalization like regularization (DNR), particularly for neurons that have increased information for a memory load of 2 items. The authors seem to suggest that if neurons carry more information for load 2 trials compared to load 1 trials, then this reflects an interaction between the two items such that information increases. First, how exactly the authors are envisioning this isn't clear to me. It would be helpful to provide a specific example neuron that shows an increase in information, along with its response to the sample, reference, and pair stimuli.

That being said, if I understand correctly, the authors seem to imply there is a non-linear effect on the neural response of some neurons when two stimuli are presented. This seems consistent with previous work (e.g., non-linear responses seen by Fusi). However, this doesn't seem consistent with the DNR model. The DNR model makes a fairly clear prediction that the response to two stimuli will be a (weighted) average of the response to each stimulus alone. Given this, it seems to me that the DNR model doesn't have a mechanism for explaining an increase in information as the number of stimuli increase. Typically, in the DNR framework, a slope close to zero is taken as evidence that the response to the Pair is equal to the Ref (e.g., when attention is shifted to the reference). That seems unlikely to be the case here, but highlights how a slope near 0 is not sufficient to argue that the results can be explained by a DNR effect.

Perhaps I am missing something, in which case I would suggest the authors make this point more clearly in the current manuscript. Or, if the classic DNR model can't predict the increase in information for two-item preferring neurons, then I think the current results could argue for other normalization mechanisms (or a combination of mechanisms) that could be acting in the brain.

2. It wasn't clear to me what statistics were corrected for multiple comparisons across time and/or groups. To measure sensory/memory information in individual neurons, the authors used a percent explained variance measure to test if the neuron responded differently to two different colors. As far as I can tell, this test was performed for each of six different stimulus locations and at multiple points in time (at least four). It isn't clear from the text whether the authors corrected for these multiple comparisons when determining whether a neuron was significant.

3. As noted above, the current work is consistent with behavior and neuroimaging in humans. In particular, there has been evidence from human neuroimaging work arguing for divisive normalization in working memory (e.g., Sprague et al., 2014) that seems relevant and worth citing.

4. On line 310 the authors state that "most neurons did not sustain information about color (…) throughout the entire sample or memory delay". From Figure 2, it doesn't look like any neurons were consistently active across the entire delay. Did any neurons significantly sustain information over the entire period? This seems relevant to the ongoing debate about sustained vs. transient working memory representations in non-human primates.

5. Related to the above point, the sequential activation of neurons seems consistent with a synfire chain model of working memory. This is generally what is found in other, non-primate, mammals. For example, you see sequential activation during memory time periods in mice (e.g., work from Carl Petersen or David Tank's groups). This is in contrast to the classic 'attractor' based models of working memory in primates. I think this is an important point to discuss, as it could change the interpretation of the results (although I would expect interference to still disrupt sequence generation in synfire chains).

6. Neurons in Figure 3 seem to respond during the delay period in time windows of ~250 ms. Is this intrinsic to the neural response or is it related to the 200 ms smoothing window that was used? In other words, would a similar pattern be observed if smaller smoothing windows were used?

7. In the paragraph starting on line 396 the authors argue that the existence of information on error trials, even on high load trials, suggests that memory is not an 'all-or-none' phenomena. With this, they argue against slot models of working memory. However, it seems like this was only true for the sample period – there is no information during the memory delay (Figure 3B). So, couldn't one interpret these results as an inability for a sample stimulus to be sustained during memory – in other words, that it didn't make it into a 'slot'? To be clear, I'm not arguing for the slot model, I would just suggest tempering the interpretation of this result.

Specific Comments

1. Figure 3A shows clustering of neurons by response profile. It seems odd that the second and third to last groups are not ordered by when their response occurred in time. In other words, I would have naively expected the second to last cluster to be closer to the fourth to last cluster (as they have more similar time courses). Am I missing something?

2. In Figure 5B the authors measure the DNR effect for neurons that gain information with increasing load. In the legend, they state N = 8 neurons for the delay period. This is much lower than the number of selective neurons shown in Figure 3C. What causes the discrepancy? Also, is N=8 significantly more neurons than expected by chance?

3. Line 171 seems like it has a typo – I assume it is a high-pass filter of 500 Hz and low-pass of 7125 Hz?

4. On line 256, in the legend for Figure 3 – my math may be wrong, but aren't there 36 neurons with selectivity in load 1?

5. On line 535, the authors reference Lebedev et al., 2004 as evidence for attention and WM overlapping in PFC of monkeys. My reading of that manuscript is that they argue for general separation of the two functions (with minimal overlap).

Reviewer #3 (Recommendations for the authors):

The paper is very well written. I am not an expert in the neural computational side of things, but otherwise have only a few queries. Mostly my comments are meant to improve the readability of the paper.

1. As you say, interference between memory representation is a source of information loss. A 2-sec ITI, I would think, generates a considerable amount of proactive interference. Thus is it possible that the neural outcomes are being driven by high levels of proactive interference generated by the current design?

2. Lines 138-140 I think could use a bit of elaboration, maybe in the form of an example. What threw me was the point about pairs of colors being locked to a position for a session. Eventually I came to understand what was meant, but the understanding needs to come earlier for the reader. Perhaps providing an example, using Figure 1, would help. For example, you could say... "Take the case shown in Figure 1. The colors blue and orange are locked to the left center position for that session. From trial to trial either may appear during the hold gaze period. If the left center position is chosen as the location of the target on that trial, then as shown in the Figure, the color will change to the other color of the assigned pair during the indicate change position."

3. It needs to be made more clear that loads 1, 2, and 3 refer to the number of ipsilateral items present on the display. This does become more clear as I read the manuscript but I was initially thrown by the fact that when I average across rows (aka loads) in Figure 1 the number didn't come out quite to what is mentioned on line 277. I assume the reason is that Figure 1 just shows a whole number averages.

4. Figure 2 needs clarification. First, on the x-axis, the label of "sample" is misleading (also appears throughout the text, eg line 305). I believe it represents the "hold gaze" period so it should say that. Alternatively, in Figure 1A, the "hold gaze" label in the second panel could say "sample". In fact, those labels in Figure 1A (center gaze, hold gaze, delay, indicate change) are more description of what happens in those periods rather than names of the periods, which could be: start period, sample period, delay period, response period.

5. Second (continuing on with Figure 2), it took a good deal of effort to realize what was meant by color 1 and color 2. This comment takes me back to my comment #2. regarding lines 138-140. I eventually realized that color 1 and color 2 refer to the pair of colors that appear in a hold-gaze period when that particular position is selected as the target position. Because the label "sample" was unclear, I had thought that color 1 was one of the colors during the hold-gaze period and color 2 was the other color that appeared during the indicate-change period.

6. Third (still on Figure 2). Once comments 4 and 5 were sorted then I realized that load 1, 2, and 3 (blue-yellow-red) all represent the SAME stimulus pair but with different associated loads. I was thrown because the colors (blue-yellow-red) of load 1, 2, and 3 are different, and so it made me think that color 1 and color 2 across load 1, 2, and 3 referred to different colors as well. It's all obvious now, but the issues in comments 4, 5, and 6 created a perfect storm of confusion for a while. I don't mind the three colors anymore, but as always clarification would help.

eLife. 2021 Dec 3;10:e72783. doi: 10.7554/eLife.72783.sa2

Author response


Essential revisions:

The reviewers were each positive about the research and the paper. In reviews and discussion, a number of issues were raised which the authors should address in their revised manuscript. Below are the most important points, which are considered essential revisions. The individual reviewer comments with additional detail are appended to this list for the authors' reference.

1. Load-dependent increases in color tuning seem counter to many predictions, and the authors should clarify how the DNR model can explain an increase in information as the number of stimuli increase. The suggestion appears to be that, if neurons carry more information for load 2 trials compared to load 1 trials, then this reflects an interaction between the two items that increases the information signaled, but how exactly this might occur isn't clear. There was also confusion about how non-linear effects on neural responses, which are common in the literature, relate to the DNR model. The reviewers felt that a slope near 0 is not sufficient evidence to argue conclusively that the results can be explained by DNR. It would be helpful to consider other evidence in the data and/or other normalization mechanisms, and provide a specific example neuron that shows an increase in information, along with its response to the sample, reference, and paired stimuli.

We apologize for the apparent lack of clarity in our discussion of the data. We did not mean to imply that DNR, as such, can explain the information gain at load 2. In fact, the computational principle of DNR (Heeger, 1992; Carandini and Heeger, 2013) is in tune with a loss of information at higher loads. To account for the increase in information some neurons show with increasing load we refer to a variation / application of DNR, the ‘normalization model of attention’ (Reynolds and Heeger, 2009). The model incorporates DNR to explain observations (in visual cortices) under different forms of attention. At its core are attentional fields and gain factors that affect neuronal responses. In this model we propose that DNR can explain the effect of information gain at higher loads (and the flattened linear fit Figure 5B) under the added assumption of an attentional process.

Our interpretation is based on the following: A neuron’s response (firing rate) to color 1 and color 2 (each presented by itself at the ‘favorite location’ / load 1) may be uninformative (similar firing rate for both colors). Presenting an additional color (increasing the load to 2) should affect the neurons firing rate based on DNR. As is the case for neurons that lose information, normalizing the firing rate should reduce the firing rate difference. In order to understand why we find an increase in the difference we lean on attentional modulation (normalization model of attention). If an attentional factor was present then this might have affected the DNR to become unequal (i.e., attention to the color at the additional location differentially affects firing rate for color 1 at the favorite location and color 2 at the favorite location).

Take for example the neuron in Figure 1 (Figure 5—figure supplement 2). The firing rate for the brown and blue square at the middle right position is very similar (the load 1 condition). When adding a green square at the bottom right location (the load 2 condition) the firing rate now strongly differentiates between brown and blue.

This effect can be detected on the population level and subsequently be interpreted under the normalization model of attention. If the slope of fit of the SE/SI index deviates from 0.5, this indicates an unequal interaction between colors. Applied to the example above, we’d argue that this fits an attentional effect elicited by the green square that creates the neuronal differentiation between colors at load 2, where there was none at load 1.

We chose to analyze the effect at this more abstract population level, because it offers a good comparison between the different subpopulations of neurons that lose and gain information (also see remarks to question (6) below). Furthermore, an analysis of individual interactions was not feasible due to the many different possible combinations that would have required many more trials than were available. Thus, while we think that our data fits this interpretation, we also tried to make the point that we cannot explicitly test for an attentional effect, just that our observed data (the neurons that gain information) would, nonetheless, fit an explanation based on an attentional process.

We realize now that the shift from the fundamental divisive normalization computation to the derived normalization model of attention may not have been transparent enough in our manuscript and added more detail and explanations to the result section and discussion.

The new section in the Results section (lines 300 ff.) reads:

“The ‘normalization model of attention’ (Reynolds and Heeger, 2009) incorporates divisive normalization, and can explain how attention can modulate neuronal responses. By attending a preferred (or non-preferred) second colored square in the load 2 condition the neuronal response of a neuron to the target location (i.e., to color A and to color B at the favorite location) might be altered. As a result, a difference between color A and B may arise even though each color by itself elicited a similar response. In other words, the interaction between the additional color and the target color is unequal. Neurons without a color differentiation at load 1 that gained differentiation at load 2 through this process (e.g. if the interaction of probe color A with reference color A is larger than the interaction of probe color A with reference color B, see Figure 5—figure supplement 2 for an example), should have a population regression slope smaller than 0.5.”

The new section in the discussion (lines 401 ff.) reads:

“Finally, the DNR computation may explain the responses of the neurons that gained information at load 2 through attentional processes predicted by the ‘normalization model of attention’ (Reynolds and Heeger, 2009). This may appear counter-intuitive and contradictory, considering that the same process is also responsible for the loss of information. However, when attention is overtly directed to a specific (preferred or non-preferred) item within the receptive field of a neuron, the DNR computation shifts its weighting of the normalized response towards the response of the attended item (Reynolds et al., 1999; Reynolds and Heeger, 2009). This weighted normalization can produce a difference in the neuronal response to both color identities at load 2, even if the neuronal response was non-informative at load 1. At the population level we were able to observe such an effect as the reduced slope of the selectivity/interaction fit. Thus, an attentive process might have enhanced information in WM at higher loads.

It is important to clarify that, as we did not use any form of attentional cueing in our study, we cannot explicitly test for such an attention effect. However, we do know that the animals participating in this study can use attentional cues to enhance their WM (Fongaro and Rose, 2020). The attention cues used by (Fongaro and Rose, 2020) positively affected not only encoding but also the maintenance and retrieval of the information held in WM, comparable to results from monkeys and humans (Brady and Hampton, 2018; Souza and Oberauer, 2016). We, therefore, want to emphasize that our data is in line with the interpretation that the birds possibly attended a load 2 stimulus array differently than a load 1 stimulus array in order to enhance their performance in trials with higher loads.”

2. It should be more clearly stated when working memory loads refer to the number of ipsilateral items on the display, particularly in the section on single unit analyses and in Figure 1.

We apologize for our confusing use of the terms ipsilateral and contralateral. We added clarifications. In all behavioral analyses the terms ipsi- and contralateral refer to the location of the change. Here we aimed to examine the likelihood of detecting a change depending on the load. We found no main effect of the number of items on the opposite side.

In the neural analysis the terms ipsi- and contralateral refer to a neurons’ favored location. Here we were interested to investigate the effect of additional squares on the amount of color information at the favorite location. We found a stronger modulation by additional squares in the same visual hemifield and restricted most analysis to one side (i.e. load 1-3) to reduce the number of trial combinations.

We have clarified this in the manuscript (lines 586 ff.):

“All correct trials were included in the analysis of neural data. Depending on the analysis we refer to different ‘load conditions’ relative to referential sides of the screen which are either ipsilateral (same side) or contralateral (opposite side), each with a possible load between one and three items. For the behavioral analyses the terms ipsilateral and contralateral refer to the location of the change that had to be detected. For the neurophysiological analyses the terms ipsilateral and contralateral refer to the respective neuron’s favorite location (described in the following section).”

We have also added the clarification to the figure captions of Figures1, 2 and 4:

Figure 1: (B) Boxplot of performance for different ipsilateral loads (i.e., on the side where the change occurred).

Figure 2: Color discrimination in the neuronal response (information, PEV) generally decreases with load, but some neurons show the opposite effect. Shown are the three ipsilateral load conditions (i.e., load increases on the same side as the neuron’s favorite location). Ipsilateral loads are one (blue), two (yellow), and three (red). The labels ‘color A’ and ‘color B’ always refer to the same pair of colors at the neuron’s favorite location, irrespective of the load condition.

Figure 4: Information encoding at the population level. (A) Color information (PEV) decreases with an increasing ipsilateral load (i.e., on the same side as the neuron’s favorite location) but not with an increasing contralateral load (i.e., on the opposite side to the neuron’s favorite location).

3. More clarity on the task design, terminology, and figures is needed. Specifically, what it means that color pairs are "fixed" to some of the locations, and how colors are designated "color 1" and "color 2" and whether these designations are constant across loads 1 – 3 (Figure 2) should be clarified. Perhaps an example would help here. In addition, more information on the perceptual similarity (to crows) of the colors used, and how this did or did not impact performance is important. In Figures 1 and 2, the labels "hold gaze" and "sample" appear to be used interchangeably, leading to some confusion, and consistent terms would be helpful.

We want to apologize for the unclear task description and for the inconsistent labelling of task phases that has created so much confusion.

We added a concrete example to explain the fixed locations and color changes between the two possible colors at a location (lines 531 ff.):

“The stimuli were presented at six fixed locations on the screen (1 – 6, Figure 1A). In each session, one pair of colors was assigned to each of the six locations. Each location had its own distinct pair. These pairs were randomly chosen from a pool of 14 colors (two color-combinations were excluded since the animals did not discriminate them equally well during a pre-training). Let us consider figure 1A as an example. The color-change occurs in the middle-left where turquois (T) is presented during the sample and orange (O) during the choice. In this particular session, the middle-left could thus show either of the following colors during the sample and choice: T-O (shown in Figure 1A); O-T; O-O; T-T; None-None. On the next session a new random pair of colors was displayed at this location.”

“For identification and analysis an arbitrary label was assigned to each of the randomly drawn colors at the start of each session (i.e., left middle location: ‘color A’, or ‘color B’; left top location: ‘color A’, or ‘color B’; etc.). These indices do not refer to the order of presentation of the colors at any time but were held constant for neuronal analysis. The order of presentation of colors within a pair, the target location (where the color change occurred), and the number of stimuli in the array (two to five) were randomized and balanced across trials so that each condition had an equal likelihood to appear.”

We have updated figure 1A and replaced the ‘hold gaze’ and ‘indicate change’ labels with the name of the phase that we use throughout the rest of the manuscript (i.e., sample and choice, respectively). We further added an introductory sentence to the figure caption briefly describing the task:

Figure 1: (A) Behavioral paradigm (reproduced from Balakhonov and Rose, 2017). The birds had to center and hold their gaze for the duration of the sample and delay period, and subsequently indicate which colored square had changed.

Color vision in in birds is based on 4 cone types, i.e., three cone types similar to the mammalian cones and one additional that has peak absorption in the UV spectrum around 360 nm (Kelber, 2019). Thus, birds have excellent color vision and exceed primate color vision. To ensure that specific color pairs did not influence change detection, the color pairs had been chosen for change detection based on initial training of the task, so that the animals had good change detection rates across all possible color pair combinations (two possible color pairs were excluded from use due to color similarity, see also Balakhonov and Rose, 2017).

4. A major strength of the paper is cross-species comparisons, and the reviewers suggest that the discussion of these comparisons be expanded beyond monkey neurophysiology. For instance, how do the current findings relate to sequential activity commonly reported in non-primate mammals (versus the more stable 'attractor' dynamics in primate models of working memory)? How might this work relate to behavior and neuroimaging in humans that is suggestive of DNR?

We added a short section in the discussion focused on comparisons to synfire chain models in rodents that refer to Rajan et al., 2016 and Harvey et al., 2012 (lines 446 ff.):

“This may be reconcilable with some other contemporary models of WM. One major type of those models implements ‘synfire chains’, where individual neurons fire sequentially (and transiently) to bridge temporal gaps and maintain task relevant contents (Rajan et al., 2016). This has, for example, been reported to be the case in posterior parietal cortex of mice performing a T-maze task that required WM for cued spatial locations to be maintained (Harvey et al., 2012). The transient activity of neurons that we report (Figures 2 and 3A) might fit into such models. However, our results can only be compared very cautiously to this (since even small changes in task design significantly alter neuronal responses, e.g. Lara and Wallis, 2014).”

We have added a short section concerning human neuroimaging studies to the discussion (lines 342 ff.):

“This suggests that WM could be conceptualized as a continuous resource that has to be divided between the two items (Bays and Husain, 2008; Berg et al., 2012; Wilken and Ma, 2004), rather than two ‘simple’ slots that would each have the same amount of information irrespective of the memory load. This is also consistent with results of human neuroimaging that report decreased signal amplitude and precision with increasing memory load (Emrich et al., 2013; Sprague et al., 2014).”

5. The statistical corrections for multiple comparisons across time and/or groups should be clarified, particularly in the single unit analyses.

We have made clarifications in the manuscript concerning the following issues.

We performed all analyses on which we base our interpretations (i.e., inter-load effects, correct vs. error trials, DNR) on p-values corrected for multiple comparisons (this is stated throughout the Results section).

Lines 586-592 and 609-610:

“We did not test for all six possible locations, we restricted analysis to the three locations on the right side of the screen. Of these three locations, we chose the location that had the cumulative highest PEV across the sample period (i.e., the favorite location), which for this purpose was segmented into four 200 ms bins. Significant PEV values for individual neurons were then calculated for the favorite location in 200 ms bins using permutation testing at an α of 0.05.”

Lines 618-626:

“We did not use the number of significant neurons as a measure for information about color, but rather their effect size, which is a much more derived statistical value for which there is no universally accepted critical threshold (e.g., see the small but still very relevant values of Buschman et al., 2011). As we were looking for the effect of load on WM within NCL we aimed to strengthen our analysis by including as many neurons as possible. Therefore, to determine if a single neuron had a significant amount of information about color, we did not apply further statistical corrections beyond the permutation testing.”

We hypothesized that information would decrease as WM load increases, which should be an effect happening at the level of the neuronal population, and not necessarily (as we saw) at the level of individual neurons. If we were to only include those neurons that had the most information at the individual loads (i.e., those with the highest PEV values that are significant even under very stringent statistical criteria) we would have artificially inflated the amount of information present at each load. By including neurons that encoded less information too (at an uncorrected p-value of 0.05) our analysis population was more resistant to such outlier effects.

Nonetheless, we can also report that when we apply a more stringent criterion for significance (e.g., two consecutive non-overlapping bins, i.e., a temporal requirement for consistency of the effect). When we do so, all reported effects retain their quality (see Figure 2, (Figure 4—figure supplement 2) compare to Figure 4B in the manuscript). The flipside of this is that the number of ‘significant’ neurons decreases, while the amount of information about color (PEV values) increases, which is to be expected, because the selection process of neurons now retains only those neurons showing the strongest effects (highest PEV). We think (for the reasons mentioned above) that this would represent a case of over selection (‘cherry picking’) from the obtained data, which is why we would like to retain our original population.”

6. How do the authors reconcile the fact that the number of single neurons tuned to color tends to increase at higher loads, but there is a loss of color information at the population level?

There seems to be a misunderstanding. The number of single neurons with significant color information in the three load conditions is quite stable. Author response table 1, shows this (values are taken from the pie charts of Figure 3C)

Author response table 1. Proportion of neurons with significant color information (number of neurons) at the different load conditions in the sample and the delay phase.

Phase Load 1 Load 2 Load 3
Sample 55 % (137 neurons) 55 % (137 neurons) 43 % (107 neurons)
Delay 36 % (34 neurons) 37 % (35 neurons) 39 % (37 neurons)

The overall loss of information at the population level (the central finding of this study) and each neuron’s individual significance are not necessarily related. This is because the absolute amount of information about color (i.e., the PEV value) at load 2 or 3 can be lower than at load 1 while still remaining significant. We purposefully investigated and compared these significant subpopulations to investigate the central question of information loss and capacity. We expand on how neurons lose information and why some neurons have significant information at higher loads when they did not have this color information at load 1.

7. Reviewers requested more detail on whether any neurons in the population sustained information across the delay, since it is relevant to the ongoing debate about sustained vs. transient working memory representations in non-human primates.

There were only few neurons that sustained an elevated firing rate throughout longer phases of the delay period. Those neurons did not differentiate their firing rate in relation to the different colors, i.e. they did not carry significant amounts of color information throughout their sustained activity. This can be observed in the cluster plot (Figure 3A) where it is obvious that information was ever only transiently maintained by individual neurons. This is also reflected in the example neurons shown in Figures 2, S4 and S5. Insofar our results are consistent with contemporary conceptions of WM (in primates), that suggest that persistent delay activity may not be the central mechanism by which information is maintained in WM (e.g, Miller et al., 2018, Lundqvist et al., 2018). We have added a short section in the discussion to address this point, together with our remarks about synfire chains (see point 4). (lines 442 ff.)

“There is also ongoing debate about the role of sustained activity during delay periods and how it relates to WM (Constantinidis et al., 2018, Lundqvist et al., 2018, Miller et al., 2018). We cannot report of any neuron that showed persistent activity comparable to those reported by classical WM studies in PFC (e.g., Fuster and Alexander, 1971, Funahashi et al., 1992), or in NCL (Diekamp et al., 2002, Veit and Nieder, 2013, Veit et al., 2014).”

8. Neurons in Figure 3 seem to respond during the delay period in time windows of ~250 ms. Is this intrinsic to the neural response or is it related to the 200 ms smoothing window that was used? In other words, would a similar pattern be observed if smaller smoothing windows were used?

The neuronal response is not dependent on the window size. Using a window size of 100 ms or 50 ms returns virtually the same response. The raster plots at the top of each neurons’ figure give an indication of that. We have included in Author response image 1, the example neuron of figure 2A smoothed with a 100 ms window.

Author response image 1. Example neuron (same as in Figure 2A of the manuscript), smoothed with 100 ms bins.

Author response image 1.

Top: raster plot, where every dot represents a single spike during the individual trials (rows of dots); middle: peri-stimulus-time histogram (PSTH) of average firing rate (solid line for color ID 1, dashed line for color ID 2) with the standard error of the mean (shaded areas); bottom: percent explained variance of color identity (a measure of information about color) along the trial, the line at the top of the y-axis indicates significant bins.

9. The authors argue that the existence of information on error trials, even on high load trials, suggests that memory is not an 'all-or-none' phenomena, and this is inconsistent with the slot models of working memory. However, it seems like this was only true for the sample period, and there was no information during the memory delay (Figure 3B). Could this be interpreted as an inability for a sample stimulus to be sustained during memory – in other words, that it didn't make it into a 'slot'?

We have included this point in the Results section (lines 232 ff.):

“A possible alternative ‘slot-model’ explanation would be that, on error trials, the color information was completely lost after the sample phase, because it was not successfully transferred into a slot (or that a slot was not available to take on information). The graded amount of information on correct trials is not compatible with the simple (all or none) slot model but could fit the ‘slots and averaging model’ (Zhang and Luck, 2008).”

10. While the results are conceptually consistent with classic effects in monkey lateral PFC, Lara and Wallis (Nat Neurosci, 2014) reported a near absence of color tuning in a very similar color change detection task, and instead found predominantly spatial attention signals. Can the authors discuss this discrepancy?

We have added a short paragraph discussing the discrepancies to Lara and Wallis (2014) (lines 359 ff.):

“We probed the WM capacity of crows using colored squares, based on the task design of Buschman et al., (2011). Using the identical task allowed us to directly compare our neuronal results of WM capacity from NCL to results from PFC of monkeys. Task similarity is very important for such cross species comparisons as even small changes in task parameters may introduce substantial differences in neuronal responses, leading to potentially different conclusions. In a task similar to the one used here Lara and Wallis (2014) have found that neurons in the PFC of monkeys encoded nearly no information about color, but instead about location. In their task monkeys had to memorize the color of squares at two locations on a screen, and were again confronted with a colored square at one of the two locations after a delay. The monkeys then had to indicate if the color at that location had changed. Lara and Wallis (2014) discuss the absence of color information in the neurons they recorded in relation to the task of Buschman et al., (2011), who like us, did find color information. In brief, the exact task design may determine the neuronal encoding of task relevant information (Lara and Wallis, 2014). Similar to the complex contribution of PFC neurons to WM, neurons of NCL can also encode a wide range of very different task relevant aspects, like color (this study), spatial locations (Veit et al., 2015, Rinnert et al., 2019), and more abstract items like rules (Veit and Nieder, 2013), and numerosities (Ditz and Nieder, 2015).”

11. Interference between memory representation is mentioned as a source of information loss, and a 2-sec ITI likely generates a considerable amount of proactive interference. Thus is it possible that the neural outcomes are being driven by high levels of proactive interference generated by the current design?

We agree that proactive interference could be a source of overall difficulty in our task and effect, for instance, the overall estimate of WM capacity.

However, we think that it is highly unlikely that proactive interference affected the reported results in any meaningful way since trial-types were randomized within each session.

It may be also noteworthy that the 2s ITI does not reflect the entire separation between trials. Birds were given additional 2 seconds to consume their reward after correct choices and received a 10 s timeout after incorrect choices. Furthermore, the birds were free to assume head fixation on their own time, which resulted in a random offset in each trial.

Reviewer #1 (Recommendations for the authors):

1. While the results are conceptually consistent with classic effects in monkey lateral PFC, Lara and Wallis (Nat Neurosci, 2014) reported a near absence of color tuning in a very similar color change detection task, and instead found predominantly spatial attention signals. Can the authors discuss this discrepancy?

Thank you for pointing out the study. Please refer to point (10) of the essential revisions.

2. Is there much known about color vision in crows? Specifically, are there any subsets of colors used in this study that could have appeared more perceptually similar/different to the crows, and does that affect performance?

Color vision in in birds is based on 4 cone types, i.e., three cone types similar to the mammalian cones and one additional that has peak absorption in the UV spectrum around 360 nm (Kelber, 2019). Thus, birds have excellent color vision and exceed primate color vision. To ensure that specific color pairs did not influence change detection, the color pairs had been chosen for change detection based on initial training of the task, so that the animals had good change detection rates across all possible color pair combinations (two possible color pairs were excluded from use due to color similarity, see also Balakhonov and Rose, 2017).

3. It wasn't immediately obvious whether the single unit analyses that looked at effects of working memory load were based on ipsilateral, or total load. Based on the behavioral results, it seems that it should be ipsilateral only, but can the authors clarify this?

We apologize for our confusing use of the terms ipsilateral and contralateral. We have addressed the issue under point (2) of the essential revisions, and added clarifications to the manuscript (Lines 586 ff. and 609 f.) and to the figure captions of Figures 1,2 and 4.

4. How do the authors reconcile the fact that the number of single neurons tuned to color tends to increase at higher loads, but there is a loss of color information at the population level?

We are not sure what the reviewer is referring to specifically. There may be a misunderstanding. The number of single neurons with significant color information in the three load conditions seems quite stable. (Author response image 1) shows this (values are taken from the pie charts of Figure 3C)

The overall loss of information at the population level (the central finding of this study) and each neuron’s individual significance are not necessarily related. This is because the absolute amount of information about color (i.e., the PEV value) at load 2 or 3 can be lower than at load 1 while still remaining significant. We purposefully investigated and compared these significant subpopulations to investigate the central question of information loss and capacity. We expand on how neurons lose information and why some neurons have significant information at higher loads when they did not have this color information at load 1 (also see below).

5. Load-dependent increases in color tuning also seem counter to many predictions. Are there any similar results in the NHP literature?

We are not aware of any studies reporting such gains of information.

Also, I wasn't able to follow how divisive normalization results in gain of information with increasing load. Can the authors elaborate on this?

We apologize for the lack of clarity in discussing this aspect of our data. Please refer to point (1) of the essential revisions, where we have explained the effect and included an example.

6. Is it predicted that neurons without clear color tuning also exhibited load effects consistent with divisive normalization (Supp Figure 7)? If not, how is this result interpreted?

Yes, DNR should also occur if significant differentiation between colors (i.e., color tuning) is not present. This is because the computation is emerging from the activity of neuronal network (populations) irrespective of tuning of individual neurons. The point is that those neurons (SFigure 7) weren’t affected by the conjectured attentional processes. This is relevant insofar that we chose to split the population of neurons into three groups based on their significances for color information (i.e., (1) information at load 1 and subsequent loss of information at higher load; (2) no information at load 1 but gain of information at load 2; (3) no information at load 1 and no gain of information at load 2) and expected to find group specific results. Namely, classic DNR reducing information (due to equal interaction, i.e. slopes ~ 0.5) for group (1) and (3) and unequal interaction due to attention resulting in slopes different from 0.5 for group (2). Which is what we found.

Reviewer #2 (Recommendations for the authors):

General Comments

1. I think my biggest concern is regarding the argument for divisive normalization like regularization (DNR), particularly for neurons that have increased information for a memory load of 2 items. The authors seem to suggest that if neurons carry more information for load 2 trials compared to load 1 trials, then this reflects an interaction between the two items such that information increases. First, how exactly the authors are envisioning this isn't clear to me. It would be helpful to provide a specific example neuron that shows an increase in information, along with its response to the sample, reference, and pair stimuli.

That being said, if I understand correctly, the authors seem to imply there is a non-linear effect on the neural response of some neurons when two stimuli are presented. This seems consistent with previous work (e.g., non-linear responses seen by Fusi). However, this doesn't seem consistent with the DNR model. The DNR model makes a fairly clear prediction that the response to two stimuli will be a (weighted) average of the response to each stimulus alone. Given this, it seems to me that the DNR model doesn't have a mechanism for explaining an increase in information as the number of stimuli increase. Typically, in the DNR framework, a slope close to zero is taken as evidence that the response to the Pair is equal to the Ref (e.g., when attention is shifted to the reference). That seems unlikely to be the case here, but highlights how a slope near 0 is not sufficient to argue that the results can be explained by a DNR effect.

Perhaps I am missing something, in which case I would suggest the authors make this point more clearly in the current manuscript. Or, if the classic DNR model can't predict the increase in information for two-item preferring neurons, then I think the current results could argue for other normalization mechanisms (or a combination of mechanisms) that could be acting in the brain.

We apologize for the lack of clarity in discussing this aspect of our data. Please refer to point (1) of the essential revisions, where we have explained the effect and included an example.

2. It wasn't clear to me what statistics were corrected for multiple comparisons across time and/or groups. To measure sensory/memory information in individual neurons, the authors used a percent explained variance measure to test if the neuron responded differently to two different colors. As far as I can tell, this test was performed for each of six different stimulus locations and at multiple points in time (at least four). It isn't clear from the text whether the authors corrected for these multiple comparisons when determining whether a neuron was significant.

We apologize for the lack of clarity in our methods. We have added clarifications to the manuscript (lines 586-592 and 609-610). Please refer to our answer to point (5) of the essential revisions for a detailed account of our statistical methodology.

3. As noted above, the current work is consistent with behavior and neuroimaging in humans. In particular, there has been evidence from human neuroimaging work arguing for divisive normalization in working memory (e.g., Sprague et al., 2014) that seems relevant and worth citing.

Thank you for pointing this out, we have added this suggestion to the discussion (lines 342 ff.):

“This suggests that WM could be conceptualized as a continuous resource that has to be divided between the two items (Bays and Husain, 2008; Berg et al., 2012; Wilken and Ma, 2004), rather than two ‘simple’ slots that would each have the same amount of information irrespective of the memory load. This is also consistent with results of human neuroimaging that report decreased signal amplitude and precision with increasing memory load (Emrich et al., 2013; Sprague et al., 2014).”

4. On line 310 the authors state that "most neurons did not sustain information about color (…) throughout the entire sample or memory delay". From Figure 2, it doesn't look like any neurons were consistently active across the entire delay. Did any neurons significantly sustain information over the entire period? This seems relevant to the ongoing debate about sustained vs. transient working memory representations in non-human primates.

Thank you for mentioning this topic. Discussing these aspects of neuronal response, also in relation to other relevant models of mammalian WM may help to round out our comparative perspective (see also answer to the next point). We have added a short section in the discussion to address this point (lines 442 ff.). Please refer to our answer to point (7) of the essential revisions.

5. Related to the above point, the sequential activation of neurons seems consistent with a synfire chain model of working memory. This is generally what is found in other, non-primate, mammals. For example, you see sequential activation during memory time periods in mice (e.g., work from Carl Petersen or David Tank's groups). This is in contrast to the classic 'attractor' based models of working memory in primates. I think this is an important point to discuss, as it could change the interpretation of the results (although I would expect interference to still disrupt sequence generation in synfire chains).

We added a short section in the discussion focused on comparisons to synfire chain models in rodents that refer to Rajan et al., 2016 and Harvey et al., 2012 (lines 446 ff.). Please refer to our answer to point (4) of the essential revisions.

6. Neurons in Figure 3 seem to respond during the delay period in time windows of ~250 ms. Is this intrinsic to the neural response or is it related to the 200 ms smoothing window that was used? In other words, would a similar pattern be observed if smaller smoothing windows were used?

The neuronal response is not dependent on the window size. Using a window size of 100 ms or 50 ms returns virtually the same response. The raster plots at the top of each neurons’ figure give an indication of that. We have included an example under point (8) of the essential revisions.

7. In the paragraph starting on line 396 the authors argue that the existence of information on error trials, even on high load trials, suggests that memory is not an 'all-or-none' phenomena. With this, they argue against slot models of working memory. However, it seems like this was only true for the sample period – there is no information during the memory delay (Figure 3B). So, couldn't one interpret these results as an inability for a sample stimulus to be sustained during memory – in other words, that it didn't make it into a 'slot'? To be clear, I'm not arguing for the slot model, I would just suggest tempering the interpretation of this result.

Thank you for raising this point. Please refer to our answer to point (9) of the essential revisions.

Specific Comments

1. Figure 3A shows clustering of neurons by response profile. It seems odd that the second and third to last groups are not ordered by when their response occurred in time. In other words, I would have naively expected the second to last cluster to be closer to the fourth to last cluster (as they have more similar time courses). Am I missing something?

The specific clustering of the latter groups is indeed peculiar, it is, however based on the algorithm’s classification that takes the entire trials’ activity into account. The third to last group is a sibling to the fourth to last that seems to have two centers of peak information (one at the beginning of the delay and one smaller at the end). The second to last group does not have the peak in information at the end of the delay. This likely affected grouping into the observed pattern. This may be resolved by changing the target number of clusters, but would then be suboptimal in terms of the optimal number of clusters as determined by the within-cluster sum-of-squares measure.

2. In Figure 5B the authors measure the DNR effect for neurons that gain information with increasing load. In the legend, they state N = 8 neurons for the delay period. This is much lower than the number of selective neurons shown in Figure 3C. What causes the discrepancy? Also, is N=8 significantly more neurons than expected by chance?

That particular subgroup only has 8 neurons (nonsignificant at load 1 and significant in the delay (i.e., the entire delay measured as one bin) at load 2). The values of significant neurons between Figure 3C and 5 are not the same because 3C states neurons with significant information binned in 200 ms bins (see line 608 ff. of the methods), whereas 5C states neurons with significant information in the sample and/or delay as one bin (800 ms and 1000 ms) respectively (see lines 676 ff. of the methods).

The reason for pooling neurons across a larger time window for the analyses concerning Figure 5 was to facilitate interpretation of the results. By considering the entire sample or delay phase we were able to interpret the effect at the level of the population response irrespective of individual neurons’ specific response curves.

We have added clarifications to the figure caption of Figure 5:

(A) Information carrying neurons in the sample phase (as one bin; n = 105; left) and delay phase (as one bin; n = 43; right) population. (B) Information gaining neurons in the sample phase (as one bin; n = 56 ; left) and delay phase (as one bin; n = 8; right) population. The red line indicates the regression fit.

And have added our reason for changing the criterion for significance in the methods section (lines 679 ff.):

“We considered the entire sample and delay phase because we wanted to analyze the population response as a whole, irrespective of highly diverse response profiles of individual neurons.”

3. Line 171 seems like it has a typo – I assume it is a high-pass filter of 500 Hz and low-pass of 7125 Hz?

Yes, this has been corrected.

4. On line 256, in the legend for Figure 3 – my math may be wrong, but aren't there 36 neurons with selectivity in load 1?

The number of neurons seems to be correct. Significant delay neurons, at load 1 are: 28 % (only load 1) + 3 % (load 1 and load 2) + 5 % (load 1 and load 3) = 36 % of 94 delay neurons in total: 0.36 * 94 = 33.84, adjusted for rounding errors (as depicted numbers are rounded) = 34 neurons, as stated.

5. On line 535, the authors reference Lebedev et al., 2004 as evidence for attention and WM overlapping in PFC of monkeys. My reading of that manuscript is that they argue for general separation of the two functions (with minimal overlap).

Lebedev and colleagues find three types of neurons: memory neurons, attention neurons and hybrid neurons (encoding both memory and attention). They also explicitly state that there is a likely overlap between the different functions and that a purely mnemonic function of PFC is too simple to explain the data. We therefore think it is a good example of the literature (backed up by more novel studies, like Panichello and Buschman, 2021) to illustrate the co-occurrence of WM and attention in PFC.

We have extended the discussion to more clearly incorporate the relationship of PFC neurons to both memory and attention (lines 387 ff.):

“Beyond the domain of sensory signals, attention and WM may be directly linked. Neuronal correlates of WM and attention overlap in PFC neurons, for example Lebedev et al., (2004) found that a substantial amount of PFC neurons encode either an attentional signal, or a memory signal, and some (hybrid) neurons do both. A purely mnemonic function of PFC thereby seems unlikely. Indeed, very recently Panichello and Buschman (2021) have reported that at the population level neurons of PFC encode ‘both the selection of items from working memory and attention to sensory inputs’ (p.2), rather than just memory content.”

Reviewer #3 (Recommendations for the authors):

The paper is very well written. I am not an expert in the neural computational side of things, but otherwise have only a few queries. Mostly my comments are meant to improve the readability of the paper.

1. As you say, interference between memory representation is a source of information loss. A 2-sec ITI, I would think, generates a considerable amount of proactive interference. Thus is it possible that the neural outcomes are being driven by high levels of proactive interference generated by the current design?

Thank you for raising this point. Please refer to our answer to point (11) of the essential revisions.

2. Lines 138-140 I think could use a bit of elaboration, maybe in the form of an example. What threw me was the point about pairs of colors being locked to a position for a session. Eventually I came to understand what was meant, but the understanding needs to come earlier for the reader. Perhaps providing an example, using Figure 1, would help. For example, you could say... "Take the case shown in Figure 1. The colors blue and orange are locked to the left center position for that session. From trial to trial either may appear during the hold gaze period. If the left center position is chosen as the location of the target on that trial, then as shown in the Figure, the color will change to the other color of the assigned pair during the indicate change position."

Thank you for pointing this out. We added a concrete example to explain the fixed locations and color changes between the two possible colors at a location in (lines 531 ff.). Please refer to our answer to point (3) of the essential revisions.

3. It needs to be made more clear that loads 1, 2, and 3 refer to the number of ipsilateral items present on the display. This does become more clear as I read the manuscript but I was initially thrown by the fact that when I average across rows (aka loads) in Figure 1 the number didn't come out quite to what is mentioned on line 277. I assume the reason is that Figure 1 just shows a whole number averages.

We apologize that our use of the different load conditions was not clear enough. We have added a section to clarify the terminology around the loads, and the terms ‘ipsilateral’ and ‘contralateral’ (lines 586 ff.). Please refer to our answer to point (2) of the essential revisions.

We further added a remark to the caption of figure 1 that indicates that the displayed numerical values were rounded to the nearest integer.

4. Figure 2 needs clarification. First, on the x-axis, the label of "sample" is misleading (also appears throughout the text, eg line 305). I believe it represents the "hold gaze" period so it should say that. Alternatively, in Figure 1A, the "hold gaze" label in the second panel could say "sample". In fact, those labels in Figure 1A (center gaze, hold gaze, delay, indicate change) are more description of what happens in those periods rather than names of the periods, which could be: start period, sample period, delay period, response period.

We have updated figure 1A and replaced the ‘hold gaze’ and ‘indicate change’ labels with the phase appropriate name of the phase that we use throughout the rest of the manuscript (i.e., sample and choice, respectively). We further added an introductory sentence to the figure caption briefly describing the task:

“Figure 3: (A) Behavioral paradigm (reproduced from Balakhonov & Rose, 2017). The birds had to center and hold their gaze for the duration of the sample and delay period, and subsequently indicate which colored square had changed.”

5. Second (continuing on with Figure 2), it took a good deal of effort to realize what was meant by color 1 and color 2. This comment takes me back to my comment #2. regarding lines 138-140. I eventually realized that color 1 and color 2 refer to the pair of colors that appear in a hold-gaze period when that particular position is selected as the target position. Because the label "sample" was unclear, I had thought that color 1 was one of the colors during the hold-gaze period and color 2 was the other color that appeared during the indicate-change period.

We have decided to exchange the labels ‘1’ and ‘2’ for ‘A’ and ‘B’, respectively to avoid the implication of sequential presentation and added a sentence to the figure caption to clarify the meaning of colors A and B.

Figure 4: Color discrimination in the neuronal response (information, PEV) generally decreases with load, but some neurons show the opposite effect. Shown are the three ipsilateral load conditions (i.e., load increases on the same side as the neuron’s favorite location). Ipsilateral loads are one (blue), two (yellow), and three (red). The labels ‘color A’ and ‘color B’ always refer to the same pair of colors at the neuron’s favorite location, irrespective of the load condition.

Further, refer to changes made in accordance with questions (2) and (4).

6. Third (still on Figure 2). Once comments 4 and 5 were sorted then I realized that load 1, 2, and 3 (blue-yellow-red) all represent the SAME stimulus pair but with different associated loads. I was thrown because the colors (blue-yellow-red) of load 1, 2, and 3 are different, and so it made me think that color 1 and color 2 across load 1, 2, and 3 referred to different colors as well. It's all obvious now, but the issues in comments 4, 5, and 6 created a perfect storm of confusion for a while. I don't mind the three colors anymore, but as always clarification would help.

We want to apologize for the lack of clarity and the confusion that ensued because of it. We have added the clarification concerning the colors in the figure caption (refer to answer to question (4) above). We hope all the individual changes made at the different part of the manuscript will help the readers to more easily understand the design of our task and analysis.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Hahn LA, Balakhonov D, Rose J. 2021. Working memory capacity of crows and monkeys arises from similar neuronal computations. Dryad Digital Repository. [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Transparent reporting form
    Source data 1. Reported statistical results and numerical values.
    elife-72783-supp1.zip (204.8KB, zip)

    Data Availability Statement

    All details of statistics reported in the manuscript is provided as a supporting file. Source data files of all figures are publicly available via dryad https://doi.org/10.5061/dryad.0k6djhb1q.

    The following dataset was generated:

    Hahn LA, Balakhonov D, Rose J. 2021. Working memory capacity of crows and monkeys arises from similar neuronal computations. Dryad Digital Repository.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES