Skip to main content
eLife logoLink to eLife
. 2020 Dec 15;9:e58906. doi: 10.7554/eLife.58906

Comprehension of computer code relies primarily on domain-general executive brain regions

Anna A Ivanova 1,2,, Shashank Srikant 3, Yotaro Sueoka 1,2, Hope H Kean 1,2, Riva Dhamala 4, Una-May O'Reilly 3, Marina U Bers 4, Evelina Fedorenko 1,2,
Editors: Andrea E Martin5, Timothy E Behrens6
PMCID: PMC7738192  PMID: 33319744

Abstract

Computer programming is a novel cognitive tool that has transformed modern society. What cognitive and neural mechanisms support this skill? Here, we used functional magnetic resonance imaging to investigate two candidate brain systems: the multiple demand (MD) system, typically recruited during math, logic, problem solving, and executive tasks, and the language system, typically recruited during linguistic processing. We examined MD and language system responses to code written in Python, a text-based programming language (Experiment 1) and in ScratchJr, a graphical programming language (Experiment 2); for both, we contrasted responses to code problems with responses to content-matched sentence problems. We found that the MD system exhibited strong bilateral responses to code in both experiments, whereas the language system responded strongly to sentence problems, but weakly or not at all to code problems. Thus, the MD system supports the use of novel cognitive tools even when the input is structurally similar to natural language.

Research organism: Human

Introduction

The human mind is endowed with a remarkable ability to support novel cognitive skills, such as reading, writing, map-based navigation, mathematical reasoning, and scientific logic. Recently, humanity has invented another powerful cognitive tool: computer programming. The ability to flexibly instruct programmable machines has led to a rapid technological transformation of communities across the world (Ensmenger, 2012); however, little is known about the cognitive and neural systems that underlie computer programming skills.

Here, we investigate which neural systems support one critical aspect of computer programming: computer code comprehension. By code comprehension, we refer to a set of cognitive processes that allow programmers to interpret individual program tokens (such as keywords, variables, and function names), combine them to extract the meaning of program statements, and, finally, combine the statements into a mental representation of the entire program. It is important to note that code comprehension may be cognitively and neurally separable from cognitive operations required to process program content, that is, the actual operations described by code. For instance, to predict the output of the program that sums the first three elements of an array, the programmer should identify the relevant elements and then mentally perform the summation. Most of the time, processing program content recruits a range of cognitive processes known as computational thinking (Wing, 2006; Wing, 2011), which include algorithm identification, pattern generalization/abstraction, and recursive reasoning (e.g., Kao, 2010). These cognitive operations are notably different from code comprehension per se and may not require programming knowledge at all (Guzdial, 2008). Thus, research studies where people read computer programs should account for the fact that interpreting a computer program involves two separate cognitive phenomena: processing computer code that comprises the program (i.e., code comprehension) and mentally simulating the procedures described in the program (i.e., processing problem content).

Given that code comprehension is a novel cognitive tool, typically acquired in late childhood or in adulthood, we expect it to draw on preexisting cognitive systems. However, the question of which cognitive processes support code comprehension is nontrivial. Unlike some cognitive inventions that are primarily linked to a single cognitive domain (e.g., reading/writing building on spoken language), code comprehension plausibly bears parallels to multiple distinct cognitive systems. First, it may rely on domain-general executive resources, including working memory and cognitive control (Bergersen and Gustafsson, 2011; Nakagawa et al., 2014; Nakamura et al., 2003). In addition, it may draw on the cognitive systems associated with math and logic (McNamara, 1967; Papert, 1972), in line with the traditional construal of coding as problem-solving (Dalbey and Linn, 1985; Ormerod, 1990; Pea and Kurland, 1984; Pennington and Grabowski, 1990). Finally, code comprehension may rely on the system that supports comprehension of natural languages (Fedorenko et al., 2019; Murnane, 1993; Papert, 1993). Like natural language, computer code makes heavy use of hierarchical structures (e.g., loops, conditionals, and recursive statements), and, like language, it can convey an unlimited amount of meaningful information (e.g., describing objects or action sequences). These similarities could, in principle, make the language circuits well suited for processing computer code.

Neuroimaging research is well positioned to disentangle the relationship between code comprehension and other cognitive domains. Many cognitive processes are known to evoke activity in specific brain regions/networks: thus, observing activity for the task of interest in a particular region or network with a known function can indicate which cognitive processes are likely engaged in that task (Mather et al., 2013). Prior research (Assem et al., 2020; Duncan, 2010; Duncan, 2013; Duncan and Owen, 2000) has shown that executive processes – such as attention, working memory, and cognitive control – recruit a set of bilateral frontal and parietal brain regions collectively known as the multiple demand (MD) system. If code comprehension primarily relies on domain-general executive processes, we expect to observe code-evoked responses within the MD system, distributed across both hemispheres. Math and logic also evoke responses within the MD system (Fedorenko et al., 2013), although this activity tends to be left-lateralized (Amalric and Dehaene, 2016; Amalric and Dehaene, 2019; Goel and Dolan, 2001; Micheloyannis et al., 2005; Monti et al., 2007; Monti et al., 2009; Pinel and Dehaene, 2010; Prabhakaran et al., 1997; Reverberi et al., 2009). If code comprehension draws on the same mechanisms as math and logic, we expect to observe left-lateralized activity within the MD system. Finally, comprehension of natural language recruits a set of left frontal and temporal brain regions known as the language system (e.g., Fedorenko and Thompson-Schill, 2014). These regions respond robustly to linguistic input, both visual and auditory (Deniz et al., 2019; Fedorenko et al., 2010; Nakai et al., 2020; Regev et al., 2013; Scott et al., 2017). However, they show little or no response to tasks in non-linguistic domains, such as executive functions, math, logic, music, action observation, and non-linguistic communicative signals, like gestures (Fedorenko et al., 2011; Jouravlev et al., 2019; Monti et al., 2009; Monti et al., 2012; Pritchett et al., 2018; see Fedorenko and Blank, 2020, for a review). If code comprehension relies on the same circuits that map form to meaning in natural language, we expect to see activity within the language system.

Evidence from prior neuroimaging investigations of code comprehension is inconclusive. Existing studies have provided some evidence for left-lateralized activity in regions that roughly correspond to the language system (Siegmund et al., 2014; Siegmund et al., 2017), as well as some evidence for the engagement of frontal and parietal regions resembling the MD system (Floyd et al., 2017; Huang et al., 2019; Siegmund et al., 2014; Siegmund et al., 2017). However, none of these prior studies sought to explicitly distinguish code comprehension from other programming-related processes, and none of them provide quantitative evaluations of putative shared responses to code and other tasks, such as working memory, math, or language (cf. Liu et al., 2020; see Discussion).

Here, we use functional magnetic resonance imaging (fMRI) to evaluate the role of the MD system and the language system in computer code comprehension. Three design features that were lacking in earlier neuroimaging studies of programming allow us to evaluate the relative contributions of these two candidate systems. First, we contrast neural responses evoked by code problems with those evoked by content-matched sentence problems (Figure 1A); this comparison allows us to disentangle activity evoked by code comprehension from activity evoked by the underlying program content (which is matched across code and sentence problems).

Figure 1. Experimental paradigms.

(A) Main task. During code problem trials, participants were presented with snippets of code in Python (Experiment 1) or ScratchJr (Experiment 2); during sentence problem trials, they were presented with text problems that were matched in content with the code stimuli. Each participant saw either the code or the sentence version of any given problem. (B) Localizer tasks. The MD localizer (top) included a hard condition (memorizing positions of eight squares appearing two at a time) and an easy condition (memorizing positions of four squares appearing one at a time). The language localizer (bottom) included a sentence reading and a nonword reading condition, with the words/nonwords appearing one at a time.

Figure 1.

Figure 1—figure supplement 1. Trial structure of the critical task.

Figure 1—figure supplement 1.

(A) Experiment 1 – Python. (B) Experiment 2 – ScratchJr. All analyses use functional magnetic resonance imaging responses to the ‘problem’ step.

Second, we use independent ‘localizer’ tasks (Brett et al., 2002; Fedorenko et al., 2010; Saxe et al., 2006) to identify our networks of interest: a working memory task to localize the MD system and a passive reading task to localize the language system (Figure 1B). The functional localization approach obviates the reliance on the much-criticized ‘reverse inference’ reasoning (Poldrack, 2006; Poldrack, 2011), whereby functions are inferred from coarse macro-anatomical landmarks. Instead, we can directly interpret code-evoked activity within functionally defined regions of interest (Mather et al., 2013). In addition, localization of the MD and language networks is performed in individual participants, which is important given substantial variability in their precise locations across individuals (Fedorenko and Blank, 2020; Shashidhara et al., 2019b) and leads to higher sensitivity and functional resolution (Nieto-Castañón and Fedorenko, 2012).

Third, to draw general conclusions about code comprehension, we investigate two very different programming languages: Python, a popular general-purpose programming language, and ScratchJr, an introductory visual programming language for creating animations, designed for young children (Bers and Resnick, 2015). In the Python experiment, we further examine two problem types (math problems and string manipulation) and three basic types of program structure (sequential statements, for loops, and if statements). Comprehension of both Python and ScratchJr code requires retrieving the meaning of program tokens and combining them into statements, despite the fact that the visual features of the tokens in the two languages are very different (text vs. images). If a brain system is involved in code comprehension, we expect its response to generalize across programming languages and problem types, similar to how distinct natural languages in bilinguals and multilinguals draw on the same language regions (Kroll et al., 2015).

Taken together, these design features of our study allow us to draw precise and generalizable conclusions about the neural basis of code comprehension.

Results

Participants performed a program comprehension task inside an MRI scanner. In each trial, participants, all proficient in the target programming language, read either a code problem or a content-matched sentence problem (Figure 1A) and were asked to predict the output. In Experiment 1 (24 participants, 15 women), code problems were written in Python, a general-purpose text-based programming language (Sanner, 1999). In Experiment 2 (19 participants, 12 women), code problems were written in ScratchJr, an introductory graphical programming language developed for children aged 5–7 (Bers, 2018). Both experiments were conducted with adults to facilitate result comparison. Good behavioral performance confirmed that participants were proficient in the relevant programming language and engaged with the task (Python: 99.6% response rate, 85% accuracy on code problems; ScratchJr: 98.6% response rate, 79% accuracy on code problems; see Figure 2—figure supplement 1 for detailed behavioral results). Participants additionally performed two functional localizer tasks: a hard vs. easy spatial working memory task, used to define the MD system, and a sentence vs. nonword reading task, used to define the language system (Figure 1B; see Materials and methods for details).

We then contrasted neural activity in the MD and language systems during code problem comprehension with activity during (a) sentence problem comprehension and (b) the nonword reading condition from the language localizer task. Sentence problem comprehension requires simulating the same operations as code problem comprehension (mathematical operations or string manipulation for Python, video simulation for ScratchJr), so contrasting code problems with sentence problems allows us to isolate neural responses evoked by code comprehension from responses evoked by processing problem content. Nonword reading elicits weak responses in both the language system and the MD system (in the language system, this response likely reflects low-level perceptual and/or phonological processing; in the MD system, it likely reflects the basic task demands associated with maintaining attention or reading pronounceable letter strings). Because the nonword response is much weaker than responses to the localizer conditions of interest (Fedorenko et al., 2010; Mineroff et al., 2018), nonword reading can serve as a control condition for both the MD and language systems, providing a more stringent baseline than simple fixation. Given the abundant evidence that the MD system and the language system are each strongly functionally interconnected (Blank et al., 2014; Mineroff et al., 2018; Paunov et al., 2019), we perform the key analyses at the system level.

MD system exhibits robust and generalizable bilateral responses during code comprehension

We found strong bilateral responses to code problems within the MD system in both Experiments 1 and 2 (Figures 2 and 3). These responses were stronger than responses to both the sentence problem condition (Python: β = 1.03, p<0.001, ScratchJr: β = 1.38, p<0.001) and the control nonword reading condition (Python: β = 2.17, p<0.001; ScratchJr: β = 1.23, p<0.001). The fact that code problems drove the MD system more strongly than content-matched sentence problems (despite the fact that sentence problems generally took longer to respond to; see Figure 2—figure supplement 1) demonstrates that the MD system responds to code comprehension specifically rather than simply being activated by the underlying problem content.

Figure 2. Main experimental results.

(A) Candidate brain systems of interest. The areas shown represent the ‘parcels’ used to define the MD and language systems in individual participants (see Materials and methods and Figure 3—figure supplement 1). (B, C) Mean responses to the language localizer conditions (SR – sentence reading and NR – nonwords reading) and to the critical task (SP – sentence problems and CP – code problems) in systems of interest across programming languages (B – Python, C – ScratchJr). In the MD system, we see strong responses to code problems in both hemispheres and to both programming languages; the fact that this response is stronger than the response to content-matched sentence problems suggests that it reflects activity evoked by code comprehension per se rather than just activity evoked by problem content. In the language system, responses to code problems elicit a response that is substantially weaker than that elicited by sentence problems; further, only in Experiment 1 do we observe responses to code problems that are reliably stronger than responses to the language localizer control condition (nonword reading). Here and elsewhere, error bars show standard error of the mean across participants, and the dots show responses of individual participants.

Figure 2.

Figure 2—figure supplement 1. Behavioral results.

Figure 2—figure supplement 1.

(A) Python code problems had mean accuracies of 85.1% and 86.2% for the English-identifier (CP_en) and Japanese-identifier (CP_jap) conditions, respectively, and sentence problems (SP) had a mean accuracy of 81.5%. There was no main effect of condition (CP_en, CP_jap, SP), problem structure (seq – sequential, for – for loops, if – if statements), or problem content (math vs. string); however, there was a three-way interaction among Condition (sentence problems > code with English identifiers), Problem Type (string >math), and Problem Structure (for loop >sequential; p=0.02). Accuracy data from one participant had to be excluded due to a bug in the script. (B) ScratchJr code problems had a mean accuracy of 78.0%, and sentence problems had a mean accuracy of 87.8% (the difference was significant: p=0.006). (C) Python problems with English identifiers had a mean response time (RT) of 17.56 s (SD = 9.05), Python problems with Japanese identifiers had a mean RT of 19.39 s (SD = 10.1), and sentence problems had a mean RT of 21.32 s (SD = 11.6). Problems with Japanese identifiers took longer to answer than problems with English identifiers (β = 3.10, p=0.002), and so did sentence problems (β = 6.12, p<0.001). There was also an interaction between Condition (sentence problems > code with English identifiers) and Program Structure (for >seq; β = −5.25, p<0.001), as well as between Condition (CP_jap > CP_en) and Program Structure (if >seq; β = −2.83, p=0.04). There was no significant difference in RTs between math and string manipulation problems. (D) ScratchJr code problems had a mean RT of 1.14 s (SD = 0.86), and sentence problems had a mean RT of 1.03 s (SD = 0.78); the difference was not significant. The RTs are reported with respect to video offset. Items where >50% participants chose the incorrect answer for the (easy) verbal condition were excluded from accuracy calculations. (E) Mean accuracies for all Python participants were above chance. (F) Mean accuracies for all ScratchJr participants were above chance.
Figure 2—figure supplement 2. Random-effects group-level analysis of Experiment 1 data (Python, code problems > sentence problems contrast).

Figure 2—figure supplement 2.

Similar to analyses reported in the main text, code-evoked activity is bilateral and recruits fronto-parietal but not temporal regions. Cluster threshold p<0.05, cluster-size FDR-corrected; voxel threshold: p<0.001, uncorrected.
Figure 2—figure supplement 3. Random-effects group-level analysis of Experiment 2 data (ScratchJr, code problems > sentence problems contrast).

Figure 2—figure supplement 3.

Similar to analyses reported in the main text, ScratchJr-evoked activity has a small right hemisphere bias. Cluster threshold p<0.05, cluster-size FDR-corrected; voxel threshold: p<0.001, uncorrected.

Figure 3. Responses to sentence problems (red) and code problems (purple) during Experiment 1 (Python; A) and Experiment 2 (ScratchJr; B) broken down by region within each system.

Abbreviations: mid – middle, ant – anterior, post – posterior, orb – orbital, MFG – middle frontal gyrus, IFG – inferior frontal gyrus, temp – temporal lobe, AngG – angular gyrus, precentral_A – the dorsal portion of precentral gyrus, precentral_B – the ventral portion of precentral gyrus. A solid line through the bars in each subplot indicates the mean response across the fROIs in that plot.

Figure 3.

Figure 3—figure supplement 1. The parcels in the two candidate brain systems of interest, multiple demand (MD) and language.

Figure 3—figure supplement 1.

The parcels are derived from group-level representations of MD and language activity and are used to define the functional regions of interest (fROIs) in individual participants (NB: we show the left hemisphere parcels for the MD system, but the system is bilateral). For each participant, the network of interest is comprised of the top 10% of voxels within each parcel with the highest t-value for the relevant contrast (MD – hard vs. easy spatial working memory task; language – sentence reading vs. nonword reading; see Materials and methods). Abbreviations: mid – middle, ant – anterior, post – posterior, orb – orbital, MFG – middle frontal gyrus, IFG – inferior frontal gyrus, temp – temporal lobe, AngG – angular gyrus.
Figure 3—figure supplement 2. ROI-level responses in the multiple demand system to the critical task (CP – code problems, SP – sentence problems) and the spatial working memory task (HardWM – hard working memory task, EasyWM – easy working memory task).

Figure 3—figure supplement 2.

(A) Experiment 1, Python; left hemisphere fROIs; (B) Experiment 1, Python; right hemisphere fROIs; (C) Experiment 2, ScratchJr; left hemisphere fROIs; (D) Experiment 2, ScratchJr; right hemisphere fROIs. No fROIs prefer both Python and ScratchJr code problems over the spatial working memory task.
Figure 3—figure supplement 3. Whole-brain group-constrained subject-specific (GSS) analysis (Fedorenko et al., 2010) based on data from Experiment 1 shows the absence of code-only brain regions.

Figure 3—figure supplement 3.

(a) Parcels defined at the group level using the code problems > sentence problems contrast, p threshold 0.001, inter-subject overlap ≥70%. (b) Activation profile for the top 10% of voxels within each parcel in (a) across conditions. All code-sensitive regions exhibit high activity during the spatial working-memory task, suggesting that they belong to the MD system. (c) Parcels defined using the contrast above plus the ‘not hard working-memory task >easy working-memory task’ contrast, p=0.5. Only one parcel was significant (right hemisphere). (d) Even that parcel’s response profile shows high activity in response to the working-memory task, modulated by difficulty, rather than a code-specific response. Abbreviations: CP – code problems; SP – sentence problems; HardWM – hard working memory task; EasyWM – easy working memory task; SR – sentence reading; NR – nonword reading.
Figure 3—figure supplement 4. Whole-brain group-constrained subject-specific (GSS) analysis (Fedorenko et al., 2010) based on data from Experiment 2.

Figure 3—figure supplement 4.

(a) Parcels defined at the group level using the code problems > sentence problems contrast, p threshold 0.001, inter-subject overlap ≥70%. Parcels where the responses to ScratchJr code were stronger than responses to all other tasks are labeled and marked in orange; they include parts of early visual cortex and parts of the ventral visual stream. (b) Activation profile for the top 10% of voxels within each parcel in (a) marked in yellow. All regions exhibit high activity during the spatial working-memory task, suggesting that they belong to the MD system. (c) Activation profile for the top 10% of voxels within each parcel in (a) marked in orange. These fROIs exhibit higher responses to ScratchJr problems compared to a working memory task; given that they are located in the visual cortex, we can infer that they respond to low-level visual properties of ScratchJr code. A follow-up conjunction analysis using the contrast in (a) plus the ‘not hard working-memory task >easy working-memory task’ contrast, p=0.5, revealed no significant parcels, indicating the lack of code-selective response. Abbreviations: CP – code problems; SP – sentence problems; HardWM – hard working memory task; EasyWM – easy working memory task; SR – sentence reading; NR – nonword reading.

To further test the generalizability of MD responses, we capitalized on the fact that our Python stimuli systematically varied along two dimensions: (1) problem type (math problems vs. string manipulation) and (2) problem structure (sequential statements, for loops, if statements). Strong responses were observed in the MD system (Figure 4A and B) regardless of problem type (β = 3.02, p<0.001; no difference between problem types) and problem structure (β = 3.14, p<0.001; sequential problems evoked a slightly weaker response, β = −0.20, p=0.002). This analysis demonstrates that the responses were not driven by one particular type of problem or by mental operations related to the processing of a particular code structure.

Figure 4. Follow-up analyses of responses to Python code problems.

(A) MD system responses to math problems vs. string manipulation problems. (B) MD system responses to code with different structure (sequential vs. for loops vs. if statements). (C) Language system responses to code problems with English identifiers (codeE) and code problems with Japanese identifiers (codeJ) in participants with no knowledge of Japanese (non-speakers) and some knowledge of Japanese (speakers) (see the ‘Language system responses...' section for details of this manipulation). (D) Spatial correlation analysis of voxel-wise responses within the language system during the main task (SP – sentence problems and CP – code problems) with the language localizer conditions (SR – sentence reading and NR – nonwords reading). Each cell shows a correlation between the activation patterns for each pair of conditions. Within-condition similarity is estimated by correlating activation patterns across independent runs.

Figure 4.

Figure 4—figure supplement 1. Spatial correlation analysis of voxel responses within the MD system during the Python experiment (CP – code problems and SP – sentence problems) with the language localizer conditions for the same participants (SR – sentence reading and NR – nonword reading).

Figure 4—figure supplement 1.

Each cell shows a correlation between voxel-level activation patterns for each condition. Within-condition similarity is estimated by correlating activation patterns across independent runs. Code problems correlate with sentence problems much more strongly than with sentence reading (β = −0.59, p<0.001) and with nonword reading (β = −0.55, p<0.001), but substantially weaker than with other code problems (β = 0.11, p<0.001). There was no main effect of hemisphere, but there was an interaction between some of the conditions and hemisphere (sentence reading: β = 0.17, p<0.001, nonword reading: β = 0.13, p=0.002), indicating that the correlation patterns of code/sentence problems were somewhat less robust in the right hemisphere.
Figure 4—figure supplement 2. The effect of programming expertise on code-specific response strength within the MD and language system in Experiment 1, Python (A, B) and Experiment 2, ScratchJr (C, D).

Figure 4—figure supplement 2.

Python expertise was evaluated with a separate 1-hr-long Python assessment (see the paper’s website, https://github.com/ALFA-group/neural-program-comprehension); ScratchJr expertise was estimated with in-scanner response accuracies. No correlations were significant.

We also tested whether MD responses to code showed a hemispheric bias similar to what is typically seen for math and logic problems (Goel and Dolan, 2001; Micheloyannis et al., 2005; Monti et al., 2007; Monti et al., 2009; Pinel and Dehaene, 2010; Prabhakaran et al., 1997; Reverberi et al., 2009). Neither Python nor ScratchJr problems showed a left-hemisphere bias for code comprehension. For Python, the size of the code problems > sentence problems effect did not interact with hemisphere (β = 0.11, p=0.46), even though the magnitude of responses to code problems as compared to nonword reading was stronger in the left hemisphere (β = 0.63, p<0.001). These results show that neural activity evoked by Python code comprehension was bilaterally distributed but that activity evoked by the underlying problem content was left-lateralized. For ScratchJr, the size of the code problems > sentence problems effect interacted with hemisphere, with stronger responses in the right hemisphere (β = 0.57, p=0.001), perhaps reflecting the bias of the right hemisphere toward visuo-spatial processing (Corballis, 2003; Hugdahl, 2011; Sheremata et al., 2010).

Follow-up analyses of activity in individual regions within the MD system demonstrated that 17 of the 20 MD fROIs (all except the fROIs located in left medial frontal cortex and in the left and right insula) responded significantly more strongly to Python code problems than to sentence problems (see Supplementary file 1 for all fROI statistics). Responses to ScratchJr were significantly stronger than responses to sentence problems in 6 of the 10 left hemisphere MD fROIs (the effect was not significant in the fROIs in superior frontal gyrus, the dorsal part of the precentral gyrus, the medial frontal cortex, and the insula) and in 8 of the 10 right hemisphere MD fROIs (the effect was not significant in the fROIs in the medial frontal cortex and the insula; see Supplementary file 1 for all fROI statistics). These analyses demonstrate that code processing is broadly distributed across the MD system rather than being localized to a particular region or to a small subset of regions.

Overall, we show that MD responses to code are strong, do not exclusively reflect responses to problem content, generalize across programming languages and problem types, and are observed across most MD fROIs.

No MD fROIs are selective for code comprehension

To determine whether any fROIs were driven selectively (preferentially) by code problems relative to other cognitively demanding tasks, we contrasted individual fROI responses to code problems with responses to a hard working memory task from the MD localizer experiment. Three fROIs located in the left frontal lobe (‘precentral_A’, ‘precentral_B’, and ‘midFrontal’) exhibited stronger responses to Python code problems than to the hard working memory task (β = 1.21, p<0.001; β = 1.89, p<0.001; and β = 0.79, p=0.011, respectively; Figure 3—figure supplement 2). However, the magnitude of the code problems > sentence problems contrast in these regions (β = 1.03, 0.95, 0.97) was comparable to the average response magnitude across all MD fROIs (average β = 1.03), suggesting that the high response was caused by processing the underlying problem content rather than by code comprehension per se. Furthermore, neither these nor any other MD fROIs exhibited higher responses to ScratchJr code compared to the hard working memory task (in fact, the ‘precentral_A’ fROI did not even show a significant code problems > sentence problems effect). We conclude that code comprehension is broadly supported by the MD system (similar to, e.g., intuitive physical inference; Fischer et al., 2016), but no MD regions are functionally specialized to process computer code.

Language system responses during code comprehension are weak and inconsistent

The responses to code problems within the language system (Figures 2 and 3) were weaker than responses to sentence problems in both experiments (Python: β = 0.98, p<0.001; ScratchJr: β = 0.99, p<0.001). Furthermore, although the responses to code problems were stronger than the responses to nonword reading for Python (β = 0.78, p<0.001), this was not the case for ScratchJr (β = 0.15, p=0.29), suggesting that the language system is not consistently engaged during computer code comprehension.

We further tested whether responses to Python code problems within the language system may be driven by the presence of English words. Our stimuli were constructed such that half of the Python problems contained meaningful identifier names, and in the other half, the English identifiers were replaced with their Japanese translations, making them semantically meaningless for non-speakers of Japanese. For this analysis, we divided our participants into two groups – those with no reported knowledge of Japanese (N = 18) and those with some knowledge of Japanese (N = 6) – and compared responses within their language regions to code problems with English vs. Japanese identifiers (Figure 4C). We found no effect of identifier language (β = 0.03, p=0.84), knowledge of Japanese (β = 0.03, p=0.93), or interaction between them (β = 0.09, p=0.71), indicating that the language system’s response to Python code was not driven by the presence of semantically transparent identifiers. This result is somewhat surprising given the language system’s strong sensitivity to word meanings (e.g., Anderson et al., 2019; Binder et al., 2009; Fedorenko et al., 2010; Fedorenko et al., 2020; Pereira et al., 2018). One possible explanation is that participants do not deeply engage with the words’ meanings in these problems because these meanings are irrelevant to finding the correct solution.

Finally, we investigated whether the responses to Python code problems within the language system were driven by code comprehension specifically or rather by the underlying problem content. When examining responses in the MD system, we could easily disentangle the neural correlates of code comprehension vs. the processing of problem content using univariate analyses: the code problems > sentence problems contrast isolated code-comprehension-related processes, and the sentence problems > nonword reading contrast isolated responses to problem content. In the language system, however, the sentence problems > nonword reading response is additionally driven by language comprehension (unlike the MD system, which does not respond to linguistic input in the absence of task demands, as evidenced by its low responses during sentence reading; see also Blank and Fedorenko, 2017; Diachek et al., 2020). Thus, responses to Python code might be evoked both by problem content and by the language-like features of Python code. To determine the relative contributions of these two factors, we computed voxel-wise spatial correlations within and between the code problem and sentence problem conditions, as well as correlations between these conditions and the sentence/nonword reading conditions from the language localizer task (Figure 4D). We reasoned that if a system is driven by problem content, the activation patterns for code and sentence problems should be similar; in contrast, if a system is driven by code comprehension per se, the activation patterns for code and sentence problems should differ. We found that the activation patterns were highly correlated between the code and sentence problems (r = 0.69, p<0.001). These correlation values were higher than the correlations between code problems and sentence reading (0.69 vs. 0.65; p<0.001), although lower than the correlations within the code problem condition (0.69 vs. 0.73; p<0.001). The fact that code and sentence problem responses are correlated over and above code problem and sentence reading responses indicates that the language system is sensitive to the content of the stimulus rather than just the stimulus type (code vs. words). Moreover, similar to the MD system, problem content can account for a substantial portion of the response in the language regions (Δr = 0.04). Note that a similar spatial correlation analysis in the MD system mirrored the result of univariate analyses (Figure 4—figure supplement 1). Thus, in both MD and language systems, response to Python code is driven both by problem content and by code-specific responses.

Overall, we found that the language system responded to code problems written in Python but not in ScratchJr. Furthermore, Python responses were driven not only by code comprehension, but also by the processing of problem content. We conclude that successful comprehension of computer code can proceed without engaging the language network.

No consistent evidence of code-responsive regions outside the MD/language systems

To search for code-responsive regions that might fall outside the MD and language systems, we performed a whole-brain GSS analysis (Fedorenko et al., 2010). GSS analysis serves the same goal as the traditional random-effects voxel-wise analysis (Holmes and Friston, 1998) but accommodates inter-individual variability in the precise locations of functional regions, thus maximizing the likelihood of finding responsive regions (Nieto-Castañón and Fedorenko, 2012). We searched for areas of activation for the code problems > sentence problems contrast (separately for Python and ScratchJr) that were spatially similar across participants. We then examined the response of such regions to code and sentence problems (using an across-runs cross-validation procedure; e.g., Nieto-Castañón and Fedorenko, 2012), as well as to conditions from the two localizer experiments. In both experiments, the discovered regions spatially resembled the MD system (Figure 3—figure supplements 3 and 4). For Python, any region that responded to code also responded to the spatial working memory task (the MD localizer). In case of ScratchJr, some fROIs responded more strongly to code problems than to the spatial working memory task; these fROIs were located in early visual areas/ventral visual stream and therefore likely responded to low-level visual properties of ScratchJr code (which includes colorful icons, objects, etc.). The traditional random-effects group analyses revealed a similar activation pattern (Figure 2—figure supplements 2 and 3). These whole-brain analyses demonstrate that the MD system responds robustly and consistently to computer code, recapitulating the results of the fROI-based analyses (Figures 24), and show that fROI-based analyses did not miss any non-visual code-responsive or code-selective regions outside the boundaries of the MD system.

Effect of proficiency on MD and language responses

We conducted an exploratory analysis to check whether engagement of the MD and/or language system in code comprehension varies with the level of programming expertise. We correlated responses within each system with independently obtained proficiency scores for Experiment 1 participants (see the paper’s website for details: https://github.com/ALFA-group/neural-program-comprehensionIvanova and Srikant, 2020; copy archived at swh:1:rev:616e893d05038da620bdf9f2964bd3befba75dc5) and with in-scanner accuracy scores for Experiment 2 participants. No correlations were significant (see Figure 4—figure supplement 2). However, due to a relatively low number of participants (N = 24 and N = 19, respectively), these results should be interpreted with caution.

Discussion

The ability to interpret computer code is a remarkable cognitive skill that bears parallels to diverse cognitive domains, including general executive functions, math, logic, and language. The fact that coding can be learned in adulthood suggests that it may rely on existing cognitive systems. Here, we tested the role of two candidate neural systems in computer code comprehension: the domain-general MD system (Duncan, 2010), which has been linked to diverse executive demands and implicated in math and logic (e.g., Amalric and Dehaene, 2019; Goel, 2007; Monti et al., 2007; Monti et al., 2009), and the language-selective system (Fedorenko et al., 2011), which has been linked to lexical and combinatorial linguistic processes (e.g., Bautista and Wilson, 2016; Fedorenko et al., 2010; Fedorenko et al., 2012; Fedorenko et al., 2020; Keller et al., 2001; Mollica et al., 2020). We found robust bilateral responses to code problems within the MD system, a pattern that held across two very different programming languages (Python and ScratchJr), types of problems (math and string manipulation), and problem structure (sequential statements, for loops, and if statements). In contrast, responses in the language system were substantially lower than those elicited by the content-matched sentence problems and exceeded responses to the control condition (nonwords reading) only for one of the two programming languages tested.

Our work uniquely contributes to the study of computer programming in the mind and brain by addressing two core issues that made it difficult to interpret results from prior studies. First, we disentangle responses evoked by code comprehension from responses to problem content (which is often not code-specific) by contrasting code problems with content-matched sentence problems. Our findings suggest that earlier reports of left-lateralized code-evoked activity (Siegmund et al., 2014) may reflect processing program content rather than code comprehension per se. This distinction should also be considered when interpreting results of other studies of programming effects on brain activity, such as debugging (Castelhano et al., 2019), variable tracking (Ikutani and Uwano, 2014; Nakagawa et al., 2014), use of semantic cues or program layout (Fakhoury et al., 2018; Siegmund et al., 2017), program generation (Krueger et al., 2020), and programming expertise (Ikutani et al., 2020).

Second, we analyze responses in brain areas that are functionally localized in individual participants, allowing for straightforward interpretation of the observed responses (Mather et al., 2013; Saxe et al., 2006). This approach stands in contrast to the traditional approach, whereby neural responses are averaged across participants on a voxel-by-voxel basis, and the resulting activation clusters are interpreted via ‘reverse inference’ from anatomy (e.g., Poldrack, 2006; Poldrack, 2011). Functional localization is particularly important when analyzing responses in frontal, temporal, and parietal association cortex, which is known to be functionally heterogeneous and variable across individuals (Blank et al., 2017; Braga et al., 2019; Fedorenko and Kanwisher, 2009; Frost and Goebel, 2012; Shashidhara et al., 2019b; Tahmasebi et al., 2012; Vázquez-Rodríguez et al., 2019).

The results of our work align well with the results of another recent study on program comprehension (Liu et al., 2020). Liu et al. investigated the neural correlates of program comprehension by contrasting Python code problems with fake code. The code problem condition was similar to ours, whereas the fake code condition involved viewing scrambled code, followed by a visual recognition task. The code problems > fake code contrast is broader than ours: it includes both code comprehension (interpreting Python code) and the processing of problem content (manipulating characters in a string). Our results show that the MD system is involved in both processes, but Python code comprehension is bilateral, whereas the processing of problem content is left-lateralized. We would therefore expect the code problems > fake code contrast to activate the MD system, engaging the left hemisphere more strongly than the right due to the demands of problem content processing. This is precisely what Liu et al. found. Further, similar to us, Liu et al. conclude that it is the MD regions, not the language regions, that are primarily involved in program comprehension.

MD system’s engagement reflects the use of domain-general resources

The fact that the MD system responds to code problems over and above content-matched sentence problems underscores the role of domain-general executive processes in code comprehension. Although cognitive processes underlying code interpretation bear parallels to logic and math tasks (Papert, 1972; Pennington and Grabowski, 1990; Perkins and Simmons, 1988) and to natural language comprehension/generation (Fedorenko et al., 2019; Hermans and Aldewereld, 2017), the neural activity we observe primarily resembles activity observed in response to domain-general executive tasks (Assem et al., 2020; Duncan, 2010; Fedorenko et al., 2013). In particular, code comprehension elicits bilateral responses within the MD system, in contrast to math and logic tasks that tend to elicit left-lateralized responses within the MD system, and in contrast to language tasks that elicit responses in the spatially and functionally distinct language system.

We found that responses in the MD system were driven both by the processing of problem content (e.g., summing the contents of an array) and by code comprehension (e.g., identifying variables referring to an array and its elements, interpreting a for loop, realizing that the output of the program is the variable being updated inside the for loop). Both of these processes plausibly require attention, working memory, inhibitory control, planning, and general flexible relational reasoning – cognitive processes long linked to the MD system (Duncan, 2010; Duncan, 2013; Duncan and Owen, 2000; Miller and Cohen, 2001) in both humans (Assem et al., 2020; Shashidhara et al., 2019a; Woolgar et al., 2018) and non-human primates (Freedman et al., 2001; Miller et al., 1996; Mitchell et al., 2016). A recent study (Huang et al., 2019) reported neural overlap between operations on programming data structures (which require both code comprehension and the processing of problem content) and a mental rotation task (which requires spatial reasoning). The overlap was observed within brain regions whose topography grossly resembles that of the MD system. In our study, all code-responsive brain regions outside the visual cortex also responded robustly during a spatial memory task (Figure 3—figure supplements 3 and 4), similar to the results reported in Huang et al., 2019. However, the MD system is not specifically tuned to spatial reasoning (Duncan, 2010; Fedorenko et al., 2013; Michalka et al., 2015), so the overlap between code comprehension and spatial reasoning likely reflects the engagement of domain-general cognitive processes, like working memory and cognitive control, as opposed to processes specific to spatial reasoning.

Furthermore, given that no regions outside of the MD system showed code-specific responses, it must be the case that code-specific knowledge representations are also stored within this system (see Hasson et al., 2015, for a general discussion of the lack of distinction between storage and computing resources in the brain). Such code-specific representations would likely include both knowledge specific to a programming language (e.g., the syntax marking an array in Java vs. Python) and knowledge of programming concepts that are shared across languages (e.g., for loops). Much evidence suggests that the MD system can flexibly store task-relevant information in the short term (e.g., Fedorenko et al., 2013; Freedman et al., 2001; Shashidhara et al., 2019a; Wen et al., 2019; Woolgar et al., 2011). However, evidence from studies on processing mathematics (e.g., Amalric and Dehaene, 2019) and physics (e.g., Cetron et al., 2019; Fischer et al., 2016) further suggests that the MD system can store some domain-specific representations in the long term, perhaps for evolutionarily late-emerging and ontogenetically late-acquired domains of knowledge. Our data add to this body of evidence by showing that the MD system stores and uses information required for code comprehension.

We also show that, instead of being concentrated in one region or a subset of the MD system, code-evoked responses are distributed throughout the MD system. This result seems to violate general metabolic and computational efficiency principles that govern much of the brain’s architecture (Chklovskii and Koulakov, 2004; Kanwisher, 2010): if some MD neurons are, at least in part, functionally specialized to process computer code, we would expect them to be located next to each other. Three possibilities are worth considering. First, selectivity for code comprehension in a subset of the MD network may only emerge with years of experience (e.g., in professional programmers). Participants in our experiments were all proficient in the target programming language but most had only a few years of experience with it. Second, code-selective subsets of the MD network may be detectable at higher spatial resolution, using invasive methods like electrocorticography (Parvizi and Kastner, 2018) or single-cell recordings (Mukamel and Fried, 2012). And third, perhaps the need to flexibly solve novel problems throughout one’s life prevents the ‘crystallization’ of specialized subnetworks within the MD cortex. All that said, it may also be the case that some subset of the MD network is causally important for code comprehension even though it does not show strong selectivity for it, similar to how damage to some MD areas (mostly, in the left parietal cortex) appears to lead to deficits in numerical cognition (Ardila and Rosselli, 2002; Kahn and Whitaker, 1991; Lemer et al., 2003; Rosselli and Ardila, 1989; Takayama et al., 1994), even though these regions do not show selectivity for numerical tasks in fMRI (Pinel et al., 2004; Shuman and Kanwisher, 2004).

The language system is functionally conservative

We found that the language system does not respond consistently during code comprehension in spite of numerous similarities between code and natural languages (Fedorenko et al., 2019). Perhaps the most salient similarity between these input types is their syntactic/combinatorial structure. Some accounts of language processing claim that syntactic operations that support language processing are highly abstract and insensitive to the nature of the to-be-combined units (e.g., Berwick et al., 2013; Fitch et al., 2005; Fitch and Martins, 2014; Hauser et al., 2002). Such accounts predict that the mechanisms supporting structure processing in language should also get engaged when we process structure in other domains, including computer code. Prior work has already put into question this idea in its broadest form: processing music, whose hierarchical structure has long been noted to have parallels with linguistic syntax (e.g., Lerdahl and Jackendoff, 1996; cf. Jackendoff, 2009), does not engage the language system (e.g., Fedorenko et al., 2011; Rogalsky et al., 2011; Chen et al., 2020). Our finding builds upon the results from the music domain to show that compositional input (here, variables and keywords combining into statements) and hierarchical structure (here, conditional statements and loops) do not necessarily engage language-specific regions.

Another similarity shared by computer programming and natural language is the use of symbols – units referring to concepts ‘out in the world’. Studies of math and logic, domains that also make extensive use of symbols, show that those domains do not rely on the language system (Amalric and Dehaene, 2019; Cohen et al., 2000; Fedorenko et al., 2011; Monti et al., 2009; Monti et al., 2012; Pinel and Dehaene, 2010; Varley et al., 2005), a conclusion consistent with our findings. However, these prior results might be explained by the hypothesis that mathematics makes use of a different conceptual space altogether (Cappelletti et al., 2001), in which case the symbol-referent analogy would be weakened. Our work provides an even stronger test of the symbolic reference hypothesis: the computer code problems we designed are not only symbolic, but also refer to the same conceptual representations as the corresponding verbal problems (Figure 1A). This parallel is particularly striking in the case of ScratchJr: each code problem refers to a sequence of actions performed by a cartoon character – a clear case of reference to concepts in the physical world. And yet, the language regions do not respond to ScratchJr, showing a clear preference for language over other types of meaningful structured input (see also Ivanova et al., 2019).

The third similarity between code and natural language is the communicative use of those systems (Allamanis et al., 2018). The programming languages we chose are very high- level, meaning that they emphasize human readability (Buse and Weimer, 2010; Klare, 1963) over computational efficiency. ScratchJr is further optimized to be accessible and engaging for young children (Sullivan and Bers, 2019). Thus, code written in these languages is meant to be read and understood by humans, not just executed by machines. In this respect, computer code comprehension is similar to reading in natural language: the goal is to extract a meaningful message produced by another human at some point in the past. And yet the communicative nature of this activity is not sufficient to recruit the language system, consistent with previous reports showing a neural dissociation between language and other communication-related activities, such as gesture processing (Jouravlev et al., 2019), intentional actions (Pritchett et al., 2018), or theory of mind tasks (Apperly et al., 2006; Dronkers et al., 1998; Jacoby et al., 2016; Paunov et al., 2019; Varley and Siegal, 2000).

Of course, the lack of consistent language system engagement in code comprehension does not mean that the mechanisms underlying language and code processing are completely different. It is possible that both language and MD regions have similarly organized neural circuits that allow them to process combinatorial input or map between a symbol and the concept it refers to. However, the fact that we observed code-evoked activity primarily in the MD regions indicates that code comprehension does not load on the same neural circuits as language and needs to use domain-general MD circuits instead.

More work is required to determine why the language system showed some activity in response to Python code. The most intuitive explanation posits that the language system responds to meaningful words embedded within the code; however, this explanation seems unlikely given the fact that the responses were equally strong when reading problems with semantically meaningful identifiers (English) and semantically meaningless identifiers (Japanese; Figure 4C). Another possibility is that participants internally verbalized the symbols they were reading (where ‘verbalize’ means to retrieve the word associated with a certain symbol rather than a simple reading response, since the latter would be shared with nonwords). However, this account does not explain the fact why such verbalization would be observed for Python and not for ScratchJr, where many blocks have easy labels, such as ‘jump’. It is also inconsistent with observations that even behaviors that ostensibly require subvocal rehearsal (e.g., mathematical operations) do not engage the language system (see e.g., Amalric and Dehaene, 2019; Fedorenko et al., 2011). Finally, the account that we consider most likely is that the responses were mainly driven by processing underlying problem content and thus associated with some aspect(s) of computational thinking that were more robustly present in Python compared to ScratchJr problems. Further investigations of the role of the language system in computational thinking have the potential to shed light on the exact computations supported by these regions.

Finally, it is possible that the language system may play a role in learning to program (Prat et al., 2020), even if it is not required to support code comprehension once the skill is learned. Studies advocating the ‘coding as another language’ approach (Bers, 2019; Bers, 2018; Sullivan and Bers, 2019) have found that treating coding as a meaning-making activity rather than merely a problem-solving skill had a positive impact on both teaching and learning to program in the classroom (Hassenfeld et al., 2020; Hassenfeld and Bers, 2020). Such results indicate that the language system and/or the general semantic system might play a role in learning to process computer code, especially in children, when the language system is still developing. This idea remains to be empirically evaluated in future studies.

Limitations of scope

The stimuli used in our study were short and only included a few basic elements of control flow (such as for loops and if statements). Furthermore, we focused on code comprehension, which is a necessary but not sufficient component of many other programming activities, such as code generation, editing, and debugging. Future work should investigate changes in brain activity during the processing and generation of more complex code structures, such as functions, objects, and large multi-component programs. Just like narrative processing recruits systems outside the regions that support single sentence processing (Baldassano et al., 2018; Blank and Fedorenko, 2020; Ferstl et al., 2008; Jacoby and Fedorenko, 2020; Lerner et al., 2011; Simony et al., 2016), reading more complex pieces of code might recruit an extended, or a different, set of brain regions. Furthermore, as noted above, investigations of expert programmers may reveal changes in how programming knowledge and use are instantiated in the mind and brain as a function of increasing amount of domain-relevant experience.

Overall, we provide evidence that code comprehension consistently recruits the MD system – which subserves cognitive processing across multiple cognitive domains – but does not consistently engage the language system, in spite of numerous similarities between natural and programming languages. By isolating neural activity specific to code comprehension, we pave the way for future studies examining the cognitive and neural correlates of programming and contribute to the broader literature on the neural systems that support novel cognitive tools.

Materials and methods

Participants

For Experiment 1, we recruited 25 participants (15 women, mean age = 23.0 years, SD = 3.0). Average age at which participants started to program was 16 years (SD = 2.6); average number of years spent programming was 6.3 (SD = 3.8). In addition to Python, 20 people also reported some knowledge of Java, 18 people reported knowledge of C/C++, 4 of functional languages, and 20 of numerical languages like Matlab and R. Twenty-three participants were right-handed, one was ambidextrous, and one was left-handed (as assessed by Oldfield’s [1971] handedness questionnaire); the left-handed participant had a right-lateralized language system and was excluded from the analyses, leaving 24 participants (all of whom had left-lateralized language regions, as evaluated with the language localizer task [see below]). Participants also reported their knowledge of foreign languages and completed a 1-hr-long Python proficiency test (available on the paper’s website, https://github.com/ALFA-group/neural-program-comprehension).

For Experiment 2, we recruited 21 participants (13 women, mean age = 22.5 years, SD = 2.8). In addition to ScratchJr, eight people also reported some knowledge of Python, six people reported knowledge of Java, nine people reported knowledge of C/C++, one of functional languages, and fourteen of numerical languages like Matlab and R (one participant did not complete the programming questionnaire). Twenty were right-handed and one was ambidextrous; all participants had left-lateralized language regions, as evaluated with the language localizer task (see below). Two participants from Experiment 2 had to be excluded due to excessive motion during the MRI scan, leaving 19 participants.

All participants were recruited from MIT, Tufts University, and the surrounding community and paid for participation. All were native speakers of English, had normal or corrected to normal vision, and reported working knowledge of Python or ScratchJr, respectively. The sample size for both experiments was determined based on previous experiments from our group (e.g., Blank and Fedorenko, 2020; Fedorenko et al., 2020; Ivanova et al., 2019) and others (e.g., Crittenden et al., 2015; Hugdahl et al., 2015; Shashidhara et al., 2019a). The protocol for the study was approved by MIT’s Committee on the Use of Humans as Experimental Subjects (COUHES). All participants gave written informed consent in accordance with protocol requirements.

Design, materials, and procedure

All participants completed the main program comprehension task, a spatial working memory localizer task aimed at identifying the MD brain regions (Fedorenko et al., 2011), and a language localizer task aimed at identifying language-responsive brain regions (Fedorenko et al., 2010).

The program comprehension task in Experiment 1 included three conditions: programs in Python with English identifiers, programs in Python with Japanese identifiers, and sentence versions of those programs (visually presented). The full list of problems can be found on the paper’s website, https://github.com/ALFA-group/neural-program-comprehension. Each participant saw 72 problems, and any given participant saw only one version of a problem. Half of the problems required performing mathematical operations, and the other half required string manipulations. In addition, both math and string-manipulation problems varied in program structure: 1/3 of the problems of each type included only sequential statements, 1/3 included a for loop, and 1/3 included an if statement.

During each trial, participants were instructed to read the problem statement and press a button when they were ready to respond (the minimum processing time was restricted to 5 s and the maximum to 50 s; mean reading time was 19 s). Once they pressed the button, four response options were revealed, and participants had to indicate their response by pressing one of four buttons on a button box. The response screen was presented for 5 s (see Figure 1—figure supplement 1A for a schematic of trial structure). Each run consisted of six trials (two per condition) and three fixation blocks (at the beginning and end of the run, and after the third trial), each lasting 10 s. A run lasted, on average, 176 s (SD = 34 s), and each participant completed 12 runs. Condition order was counterbalanced across runs and participants.

The program comprehension task in Experiment 2 included two conditions: short programs in ScratchJr and the sentence versions of those programs (visually presented). ScratchJr is a language designed to teach programming concepts to young children (Bers, 2018): users can create events and sequences of events (stories) with a set of characters and actions. The full list of problems used in the study can be found on the paper’s website. Each participant saw 24 problems, and any given participant saw only one version of a problem. Furthermore, problems varied in the complexity of the code snippet (three levels of difficulty; eight problems at each level).

During each trial, participants were presented with a fixation cross for 4 s, followed by a description (either a code snippet or a sentence) to read for 8 s. The presentation of the description was followed by 5–9 s of fixation, and then by a video (average duration: 4.13 s, SD: 1.70 s) that either did or did not match the description. Participants had to indicate whether the video matched the description by pressing one of two buttons on a button box in the scanner. The response window started with the onset of the video and included a 4 s period after the video offset. A trial lasted, on average, 27.46 s (SD = 2.54 s; see Figure 1—figure supplement 1B, for a schematic of trial structure). Each run consisted of six trials (three per condition), and a 10 s fixation at the beginning and end of the run. A run lasted, on average, 184.75 s (SD = 3.86 s); each participant completed four runs. Condition order was counterbalanced across runs and participants.

The spatial working memory task was conducted in order to identify the MD system within individual participants. Participants had to keep track of four (easy condition) or eight (hard condition) sequentially presented locations in a 3 × 4 grid (Figure 1B; Fedorenko et al., 2011). In both conditions, they performed a two-alternative forced-choice task at the end of each trial to indicate the set of locations they just saw. The hard >easy contrast has been previously shown to reliably activate bilateral frontal and parietal MD regions (Assem et al., 2020; Blank et al., 2014; Fedorenko et al., 2013). Numerous studies have shown that the same brain regions are activated by diverse executively demanding tasks (Duncan and Owen, 2000; Fedorenko et al., 2013; Hugdahl et al., 2015; Shashidhara et al., 2019a; Woolgar et al., 2011). Stimuli were presented in the center of the screen across four steps. Each step lasted 1 s and revealed one location on the grid in the easy condition, and two locations in the hard condition. Each stimulus was followed by a choice-selection step, which showed two grids side by side. One grid contained the locations shown across the previous four steps, while the other contained an incorrect set of locations. Participants were asked to press one of two buttons to choose the grid that showed the correct locations. Condition order was counterbalanced across runs. Experimental blocks lasted 32 s (with four trials per block), and fixation blocks lasted 16 s. Each run (consisting of four fixation blocks and 12 experimental blocks) lasted 448 s. Each participant completed two runs.

The language localizer task was conducted in order to identify the language system within individual participants. Participants read sentences (e.g., NOBODY COULD HAVE PREDICTED THE EARTHQUAKE IN THIS PART OF THE COUNTRY) and lists of unconnected, pronounceable nonwords (e.g., U BIZBY ACWORRILY MIDARAL MAPE LAS POME U TRINT WEPS WIBRON PUZ) in a blocked design. Each stimulus consisted of twelve words/nonwords. For details of how the language materials were constructed, see Fedorenko et al., 2010. The materials are available at http://web.mit.edu/evelina9/www/funcloc/funcloc_localizers.html. The sentences > nonword lists contrast isolates processes related to language comprehension (responses evoked by, e.g., visual perception and reading are subtracted out) and has been previously shown to reliably activate left-lateralized fronto-temporal language processing regions, be robust to changes in task and materials, and activate the same regions regardless of whether the materials were presented visually or auditorily (Fedorenko et al., 2010; Mahowald and Fedorenko, 2016; Scott et al., 2017). Further, a similar network emerges from task-free resting-state data (Braga et al., 2020). Stimuli were presented in the center of the screen, one word/nonword at a time, at the rate of 450 ms per word/nonword. Each stimulus was preceded by a 100 ms blank screen and followed by a 400 ms screen showing a picture of a finger pressing a button, and a blank screen for another 100 ms, for a total trial duration of 6 s. Participants were asked to press a button whenever they saw the picture of a finger pressing a button. This task was included to help participants stay alert. Condition order was counterbalanced across runs. Experimental blocks lasted 18 s (with three trials per block), and fixation blocks lasted 14 s. Each run (consisting of 5 fixation blocks and 16 experimental blocks) lasted 358 s. Each participant completed two runs.

fMRI data acquisition

Structural and functional data were collected on the whole-body, 3 Tesla, Siemens Trio scanner with a 32-channel head coil, at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT. T1-weighted structural images were collected in 176 sagittal slices with 1 mm isotropic voxels (TR = 2,530 ms, TE = 3.48 ms). Functional, blood oxygenation level dependent (BOLD), data were acquired using an EPI sequence (with a 90o flip angle and using GRAPPA with an acceleration factor of 2), with the following acquisition parameters: thirty-one 4 mm thick near-axial slices acquired in the interleaved order (with 10% distance factor), 2.1 mm × 2.1 mm in-plane resolution, FoV in the phase encoding (A >> P) direction 200 mm and matrix size 96 mm × 96 mm, TR = 2,000 ms and TE = 30 ms. The first 10 s of each run were excluded to allow for steady state magnetization.

fMRI data preprocessing

MRI data were analyzed using SPM12 and custom MATLAB scripts (available in the form of an SPM toolbox from http://www.nitrc.org/projects/spm_ss). Each participant’s data were motion corrected and then normalized into a common brain space (the Montreal Neurological Institute [MNI] template) and resampled into 2 mm isotropic voxels. The data were then smoothed with a 4 mm FWHM Gaussian filter and high-pass filtered (at 128 s). Effects were estimated using a General Linear Model (GLM) in which each experimental condition was modeled with a boxcar function convolved with the canonical hemodynamic response function (HRF). For the localizer experiments, we modeled the entire blocks. For the Python program comprehension experiment, we modeled the period from the onset of the code/sentence problem and until the button press (the responses were modeled as a separate condition; see Figure 1—figure supplement 1A); for the ScratchJr program comprehension experiment, we modeled the period of the code/sentence presentation (the video and the response were modeled as a separate condition; see Figure 1—figure supplement 1B).

Defining MD and language functional regions of interest (fROIs)

The fROI analyses examined responses in individually defined MD and language fROIs. These fROIs were defined using the group-constrained subject-specific (GSS) approach (Fedorenko et al., 2010; Julian et al., 2012) where a set of spatial masks, or parcels, is combined with each individual subject’s localizer activation map, to constrain the definition of individual fROIs. The parcels delineate the expected gross locations of activations for a given contrast based on prior work and large numbers of participants and are sufficiently large to encompass the variability in the locations of individual activations. For the MD system, we used a set of 20 parcels (10 in each hemisphere) derived from a group-level probabilistic activation overlap map for the hard >easy spatial working memory contrast in 197 participants. The parcels included regions in frontal and parietal lobes, as well as a region in the anterior cingulate cortex. For the language system, we used a set of six parcels derived from a group-level probabilistic activation overlap map for the sentences > nonwords contrast in 220 participants. The parcels included two regions in the left inferior frontal gyrus (LIFG, LIFGorb), one in the left middle frontal gyrus (LMFG), two in the left temporal lobe (LAntTemp and LPostTemp), and one extending into the angular gyrus (LAngG). Both sets of parcels are available on the paper’s website; see Figure 3—figure supplement 1 for labeled images of MD and language parcels. Within each parcel, we selected the top 10% most localizer-responsive voxels, based on the t-values (see, e.g., Figure 1 in Blank et al., 2014, or Shain et al., 2020 for sample MD and language fROIs). Individual fROIs defined this way were then used for subsequent analyses that examined responses to code comprehension.

Examining the functional response profiles of the MD and language fROIs

Univariate analyses

We evaluated MD and language system responses by estimating their response magnitudes to the conditions of interest using individually defined fROIs (see above). For each fROI in each participant, we averaged the responses across voxels to get a single value for each of the conditions (the responses to the localizer conditions were estimated using an across-runs cross-validation procedure, where one run was used to define the fROI and the other to estimate the response magnitudes, then the procedure was repeated switching which run was used for fROI definition vs. response estimation, and finally the estimates were averaged to derive a single value per condition per fROI per participant). We then ran a linear mixed-effect regression model to compare the responses to the critical code problem condition with (a) the responses to the sentence problem condition from the critical task, and (b) the responses to the nonword reading condition from the language localizer task. We included condition as a fixed effect and participant and fROI as random intercepts. For the MD system, we additionally tested the main (fixed) effect of hemisphere and the interaction between hemisphere and condition. We used dummy coding for condition, with code problems as the reference category, and sum coding for hemisphere. For follow-up analyses, we used the variable of interest (problem type/structure/identifier language) as a fixed effect and participant and fROI as random intercepts; dummy coding was used for all variables of interest. For fROI analyses, we used condition as a fixed effect and participant as a random intercept. The analyses were run using the lmer function from the lme4 R package (Bates et al., 2015); statistical significance of the effects was evaluated using the lmerTest package (Kuznetsova et al., 2017).

Spatial correlation analyses

To further examine the similarity of the fine-grained patterns of activation between conditions in the language system, we calculated voxel-wise spatial correlations in activation magnitudes within the code problem condition (between odd and even runs), within the sentence problem condition (between odd and even runs), between these two conditions (we used odd and even run splits here, too, to match the amount of data for the within- vs. between-condition comparisons, and averaged the correlation values across the different splits), and between these two critical conditions and each of the sentence and nonword reading conditions from the language localizer. The correlation values were calculated for voxels in each participant’s language fROIs, and then averaged across participants and fROIs for plotting (the values were weighted by fROI size). We also used the lme4 R package to calculate statistical differences between spatial correlation values for code vs. other conditions (with participant and fROI as random intercepts); for this analysis, the correlation values were Fischer-transformed.

Whole-brain analyses

For each of the critical experiments (Python and ScratchJr), we conducted (a) the GSS analysis (Fedorenko et al., 2010; Julian et al., 2012), and (b) the traditional random effects group analysis (Holmes and Friston, 1998) using the code problems > sentence problems contrast. The analyses were performed using the spm_ss toolbox (http://www.nitrc.org/projects/spm_ss), which interfaces with SPM and the CONN toolbox (https://www.nitrc.org/projects/conn).

Acknowledgements

We would like to acknowledge the Athinoula A Martinos Imaging Center at the McGovern Institute for Brain Research at MIT and its support team (Steve Shannon, Atsushi Takahashi, and Dima Ayyash), Rachel Ryskin for advice on statistics, Alfonso Nieto-Castañón for help with analyses, ALFA group at CSAIL for helpful discussions on the experiment design, and Josef Affourtit, Yev Diachek, and Matt Siegelman (EvLab), and Ruthi Aladjem, Claudia Mihm, and Kaitlyn Leidl (DevTech research group at Tufts University) for technical support during experiment design and data collection.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Anna A Ivanova, Email: annaiv@mit.edu.

Evelina Fedorenko, Email: evelina9@mit.edu.

Andrea E Martin, Max Planck Institute for Psycholinguistics, Netherlands.

Timothy E Behrens, University of Oxford, United Kingdom.

Funding Information

This paper was supported by the following grants:

  • National Science Foundation #1744809 to Marina U Bers, Evelina Fedorenko.

  • Department of Brain and Cognitive Science, MIT to Evelina Fedorenko.

  • McGovernInstitute for Brain Research to Evelina Fedorenko.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration.

Conceptualization, Software, Investigation, Methodology, Writing - review and editing.

Data curation, Software, Investigation, Methodology.

Data curation, Software, Formal analysis, Investigation, Methodology.

Software, Investigation, Methodology.

Conceptualization, Supervision, Project administration, Writing - review and editing.

Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Project administration, Writing - review and editing.

Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Project administration, Writing - review and editing.

Ethics

Human subjects: MIT's Committee on the Use of Humans as Experimental Subjects (COUHES) approved the protocol for the current study (protocol #0907003336R010, "fMRI Investigations of Language and its Relationship to Other Cognitive Abilities"). All participants gave written informed consent in accordance with protocol requirements.

Additional files

Supplementary file 1. Statistical analysis of functional ROIs in the multiple demand system.

Table 1 – Experiment 1 (Python); Table 2 – Experiment 2 (ScratchJr).

elife-58906-supp1.docx (50.1KB, docx)
Transparent reporting form

Data availability

Materials used for the programming tasks, fROI responses in individual participants (used for generating Figures 2-4), behavioral data, and analysis code files are available on the paper's website https://github.com/ALFA-group/neural-program-comprehension (copy archived at https://archive.softwareheritage.org/swh:1:rev:616e893d05038da620bdf9f2964bd3befba75dc5/). Whole brain activation maps are available at https://osf.io/9jfn5/.

The following dataset was generated:

Ivanova AA, Srikant S, Sueoka Y, Kean HH, Dhamala R, O'Reilly U-M, Bers MU, Fedorenko E. 2020. Comprehension of computer code relies primarily on domain-general executive resources. Open Science Framework. 10.17605/OSF.IO/9JFN5

References

  1. Allamanis M, Barr ET, Devanbu P, Sutton C. A survey of machine learning for big code and naturalness. arXiv. 2018 http://arxiv.org/abs/1709.06182
  2. Amalric M, Dehaene S. Origins of the brain networks for advanced mathematics in expert mathematicians. PNAS. 2016;113:4909–4917. doi: 10.1073/pnas.1603205113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Amalric M, Dehaene S. A distinct cortical network for mathematical knowledge in the human brain. NeuroImage. 2019;189:19–31. doi: 10.1016/j.neuroimage.2019.01.001. [DOI] [PubMed] [Google Scholar]
  4. Anderson AJ, Lalor EC, Lin F, Binder JR, Fernandino L, Humphries CJ, Conant LL, Raizada RDS, Grimm S, Wang X. Multiple regions of a cortical network commonly encode the meaning of words in multiple grammatical positions of read sentences. Cerebral Cortex. 2019;29:2396–2411. doi: 10.1093/cercor/bhy110. [DOI] [PubMed] [Google Scholar]
  5. Apperly IA, Samson D, Carroll N, Hussain S, Humphreys G. Intact first- and second-order false belief reasoning in a patient with severely impaired grammar. Social Neuroscience. 2006;1:334–348. doi: 10.1080/17470910601038693. [DOI] [PubMed] [Google Scholar]
  6. Ardila A, Rosselli M. Acalculia and dyscalculia. Neuropsychology Review. 2002;12:179–231. doi: 10.1023/a:1021343508573. [DOI] [PubMed] [Google Scholar]
  7. Assem M, Glasser MF, Van Essen DC, Duncan J. A Domain-General cognitive core defined in multimodally parcellated human cortex. Cerebral Cortex. 2020;30:4361–4380. doi: 10.1093/cercor/bhaa023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Baldassano C, Hasson U, Norman KA. Representation of Real-World event schemas during narrative perception. The Journal of Neuroscience. 2018;38:9689–9699. doi: 10.1523/JNEUROSCI.0251-18.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bates D, Mächler M, Bolker B, Walker S. Fitting linear Mixed-Effects models using lme4. Journal of Statistical Software. 2015;67:1–48. doi: 10.18637/jss.v067.i01. [DOI] [Google Scholar]
  10. Bautista A, Wilson SM. Neural responses to grammatically and lexically degraded speech. Language, Cognition and Neuroscience. 2016;31:567–574. doi: 10.1080/23273798.2015.1123281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bergersen GR, Gustafsson J-E. Programming skill, knowledge, and working memory among professional software developers from an investment theory perspective. Journal of Individual Differences. 2011;32:201–209. doi: 10.1027/1614-0001/a000052. [DOI] [Google Scholar]
  12. Bers MU. Coding, playgrounds and literacy in early childhood education: the development of KIBO robotics and ScratchJr. IEEE Global Engineering Education Conference (EDUCON); 2018. pp. 2094–2102. [DOI] [Google Scholar]
  13. Bers MU. Coding as another language: a pedagogical approach for teaching computer science in early childhood. Journal of Computers in Education. 2019;6:499–528. doi: 10.1007/s40692-019-00147-3. [DOI] [Google Scholar]
  14. Bers MU, Resnick M. The Official ScratchJr Book: Help Your Kids Learn to Code. 1. No Starch Press; 2015. [Google Scholar]
  15. Berwick RC, Friederici AD, Chomsky N, Bolhuis JJ. Evolution, brain, and the nature of language. Trends in Cognitive Sciences. 2013;17:89–98. doi: 10.1016/j.tics.2012.12.002. [DOI] [PubMed] [Google Scholar]
  16. Binder JR, Desai RH, Graves WW, Conant LL. Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies. Cerebral Cortex. 2009;19:2767–2796. doi: 10.1093/cercor/bhp055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Blank I, Kanwisher N, Fedorenko E. A functional dissociation between language and multiple-demand systems revealed in patterns of BOLD signal fluctuations. Journal of Neurophysiology. 2014;112:1105–1118. doi: 10.1152/jn.00884.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Blank IA, Kiran S, Fedorenko E. Can neuroimaging help aphasia researchers? addressing generalizability, variability, and interpretability. Cognitive Neuropsychology. 2017;34:377–393. doi: 10.1080/02643294.2017.1402756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Blank IA, Fedorenko E. Domain-General brain regions do not track linguistic input as closely as Language-Selective regions. The Journal of Neuroscience. 2017;37:9999–10011. doi: 10.1523/JNEUROSCI.3642-16.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Blank IA, Fedorenko E. No evidence for differences among language regions in their temporal receptive windows. NeuroImage. 2020;219:116925. doi: 10.1016/j.neuroimage.2020.116925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Braga RM, Van Dijk KRA, Polimeni JR, Eldaief MC, Buckner RL. Parallel distributed networks resolved at high resolution reveal close juxtaposition of distinct regions. Journal of Neurophysiology. 2019;121:1513–1534. doi: 10.1152/jn.00808.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Braga RM, DiNicola LM, Becker HC, Buckner RL. Situating the left-lateralized language network in the broader organization of multiple specialized large-scale distributed networks. Journal of Neurophysiology. 2020;124:1415–1448. doi: 10.1152/jn.00753.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Brett M, Johnsrude IS, Owen AM. The problem of functional localization in the human brain. Nature Reviews Neuroscience. 2002;3:243–249. doi: 10.1038/nrn756. [DOI] [PubMed] [Google Scholar]
  24. Buse RPL, Weimer WR. Learning a metric for code readability. IEEE Transactions on Software Engineering. 2010;36:546–558. doi: 10.1109/TSE.2009.70. [DOI] [Google Scholar]
  25. Cappelletti M, Butterworth B, Kopelman M. Spared numerical abilities in a case of semantic dementia. Neuropsychologia. 2001;39:1224–1239. doi: 10.1016/S0028-3932(01)00035-5. [DOI] [PubMed] [Google Scholar]
  26. Castelhano J, Duarte IC, Ferreira C, Duraes J, Madeira H, Castelo-Branco M. The role of the insula in intuitive expert bug detection in computer code: an fMRI study. Brain Imaging and Behavior. 2019;13:623–637. doi: 10.1007/s11682-018-9885-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Cetron JS, Connolly AC, Diamond SG, May VV, Haxby JV, Kraemer DJM. Decoding individual differences in STEM learning from functional MRI data. Nature Communications. 2019;10:1–10. doi: 10.1038/s41467-019-10053-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Chen X, Affourtit J, Norman-Haignere S, Jouravlev O, Malik-Moraleda S, Kean HH, Regev T, McDermott J, Fedorenko E. The fronto-temporal language system does not support the processing of music. Society for Neurobiology of Language.2020. [Google Scholar]
  29. Chklovskii DB, Koulakov AA. MAPS IN THE BRAIN: what can we learn from them? Annual Review of Neuroscience. 2004;27:369–392. doi: 10.1146/annurev.neuro.27.070203.144226. [DOI] [PubMed] [Google Scholar]
  30. Cohen L, Dehaene S, Chochon F, Lehéricy S, Naccache L. Language and calculation within the parietal lobe: a combined cognitive, anatomical and fMRI study. Neuropsychologia. 2000;38:1426–1440. doi: 10.1016/S0028-3932(00)00038-5. [DOI] [PubMed] [Google Scholar]
  31. Corballis PM. Visuospatial processing and the right-hemisphere interpreter. Brain and Cognition. 2003;53:171–176. doi: 10.1016/S0278-2626(03)00103-9. [DOI] [PubMed] [Google Scholar]
  32. Crittenden BM, Mitchell DJ, Duncan J. Recruitment of the default mode network during a demanding act of executive control. eLife. 2015;4:e06481. doi: 10.7554/eLife.06481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Dalbey J, Linn MC. The demands and requirements of computer programming: a literature review. Journal of Educational Computing Research. 1985;1:253–274. doi: 10.2190/BC76-8479-YM0X-7FUA. [DOI] [Google Scholar]
  34. Deniz F, Nunez-Elizalde AO, Huth AG, Gallant JL. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. The Journal of Neuroscience. 2019;39:7722–7736. doi: 10.1523/JNEUROSCI.0675-19.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Diachek E, Blank I, Siegelman M, Affourtit J, Fedorenko E. The Domain-General multiple demand (MD) Network does not support core aspects of language comprehension: a Large-Scale fMRI investigation. The Journal of Neuroscience. 2020;40:4536–4550. doi: 10.1523/JNEUROSCI.2036-19.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Dronkers NF, Ludy CA, Redfern BB. Pragmatics in the absence of verbal language: descriptions of a severe aphasic and a language-deprived adult. Journal of Neurolinguistics. 1998;11:179–190. doi: 10.1016/S0911-6044(98)00012-8. [DOI] [Google Scholar]
  37. Duncan J. The multiple-demand (MD) system of the primate brain: mental programs for intelligent behaviour. Trends in Cognitive Sciences. 2010;14:172–179. doi: 10.1016/j.tics.2010.01.004. [DOI] [PubMed] [Google Scholar]
  38. Duncan J. The structure of cognition: attentional episodes in mind and brain. Neuron. 2013;80:35–50. doi: 10.1016/j.neuron.2013.09.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Duncan J, Owen AM. Common regions of the human frontal lobe recruited by diverse cognitive demands. Trends in Neurosciences. 2000;23:475–483. doi: 10.1016/S0166-2236(00)01633-7. [DOI] [PubMed] [Google Scholar]
  40. Ensmenger N. The digital construction of technology: rethinking the history of computers in society. Technology and Culture. 2012;53:753–776. doi: 10.1353/tech.2012.0126. [DOI] [Google Scholar]
  41. Fakhoury S, Ma Y, Arnaoudova V, Adesope O. The effect of poor source code lexicon and readability on developers’ Cognitive Load. Proceedings of the 26th Conference on Program Comprehension; 2018. pp. 286–296. [DOI] [Google Scholar]
  42. Fedorenko E, Hsieh PJ, Nieto-Castañón A, Whitfield-Gabrieli S, Kanwisher N. New method for fMRI investigations of language: defining ROIs functionally in individual subjects. Journal of Neurophysiology. 2010;104:1177–1194. doi: 10.1152/jn.00032.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Fedorenko E, Behr MK, Kanwisher N. Functional specificity for high-level linguistic processing in the human brain. PNAS. 2011;108:16428–16433. doi: 10.1073/pnas.1112937108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Fedorenko E, Nieto-Castañon A, Kanwisher N. Lexical and syntactic representations in the brain: an fMRI investigation with multi-voxel pattern analyses. Neuropsychologia. 2012;50:499–513. doi: 10.1016/j.neuropsychologia.2011.09.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Fedorenko E, Duncan J, Kanwisher N. Broad domain generality in focal regions of frontal and parietal cortex. PNAS. 2013;110:16616–16621. doi: 10.1073/pnas.1315235110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Fedorenko E, Ivanova A, Dhamala R, Bers MU. The language of programming: a cognitive perspective. Trends in Cognitive Sciences. 2019;23:525–528. doi: 10.1016/j.tics.2019.04.010. [DOI] [PubMed] [Google Scholar]
  47. Fedorenko E, Blank I, Siegelman M, Mineroff Z. Lack of selectivity for syntax relative to word meanings throughout the language network. bioRxiv. 2020 doi: 10.1101/477851. [DOI] [PMC free article] [PubMed]
  48. Fedorenko E, Blank IA. Broca's Area Is Not a Natural Kind. Trends in Cognitive Sciences. 2020;24:270–284. doi: 10.1016/j.tics.2020.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Fedorenko E, Kanwisher N. Neuroimaging of language: why Hasn't a clearer picture emerged? Language and Linguistics Compass. 2009;3:839–865. doi: 10.1111/j.1749-818X.2009.00143.x. [DOI] [Google Scholar]
  50. Fedorenko E, Thompson-Schill SL. Reworking the language network. Trends in Cognitive Sciences. 2014;18:120–126. doi: 10.1016/j.tics.2013.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Ferstl EC, Neumann J, Bogler C, von Cramon DY. The extended language network: a meta-analysis of neuroimaging studies on text comprehension. Human Brain Mapping. 2008;29:581–593. doi: 10.1002/hbm.20422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Fischer J, Mikhael JG, Tenenbaum JB, Kanwisher N. Functional neuroanatomy of intuitive physical inference. PNAS. 2016;113:E5072–E5081. doi: 10.1073/pnas.1610344113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Fitch WT, Hauser MD, Chomsky N. The evolution of the language faculty: clarifications and implications. Cognition. 2005;97:179–210. doi: 10.1016/j.cognition.2005.02.005. [DOI] [PubMed] [Google Scholar]
  54. Fitch WT, Martins MD. Hierarchical processing in music, language, and action: lashley revisited. Annals of the New York Academy of Sciences. 2014;1316:87–104. doi: 10.1111/nyas.12406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Floyd B, Santander T, Weimer W. Decoding the representation of code in the brain: an fMRI study of code review and expertise. 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE); 2017. pp. 175–186. [DOI] [Google Scholar]
  56. Freedman DJ, Riesenhuber M, Poggio T, Miller EK. Categorical representation of visual stimuli in the primate prefrontal cortex. Science. 2001;291:312–316. doi: 10.1126/science.291.5502.312. [DOI] [PubMed] [Google Scholar]
  57. Frost MA, Goebel R. Measuring structural-functional correspondence: spatial variability of specialised brain regions after macro-anatomical alignment. NeuroImage. 2012;59:1369–1381. doi: 10.1016/j.neuroimage.2011.08.035. [DOI] [PubMed] [Google Scholar]
  58. Goel V. Anatomy of deductive reasoning. Trends in Cognitive Sciences. 2007;11:435–441. doi: 10.1016/j.tics.2007.09.003. [DOI] [PubMed] [Google Scholar]
  59. Goel V, Dolan RJ. Functional neuroanatomy of three-term relational reasoning. Neuropsychologia. 2001;39:901–909. doi: 10.1016/S0028-3932(01)00024-0. [DOI] [PubMed] [Google Scholar]
  60. Guzdial M. EducationPaving the way for computational thinking. Communications of the ACM. 2008;51:25–27. doi: 10.1145/1378704.1378713. [DOI] [Google Scholar]
  61. Hassenfeld Z, Govind M, E De Ruiter L, Umashi Bers M. If you can program, you can write: learning introductory programming across literacy levels. Journal of Information Technology Education: Research. 2020;19:065–085. doi: 10.28945/4509. [DOI] [Google Scholar]
  62. Hassenfeld ZR, Bers MU. Debugging the writing process: lessons from a comparison of students’ Coding and Writing Practices. The Reading Teacher. 2020;73:735–746. doi: 10.1002/trtr.1885. [DOI] [Google Scholar]
  63. Hasson U, Chen J, Honey CJ. Hierarchical process memory: memory as an integral component of information processing. Trends in Cognitive Sciences. 2015;19:304–313. doi: 10.1016/j.tics.2015.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Hauser MD, Chomsky N, Fitch WT. The faculty of language: what is it, who has it, and how did it evolve? Science. 2002;298:1569–1579. doi: 10.1126/science.298.5598.1569. [DOI] [PubMed] [Google Scholar]
  65. Hermans F, Aldewereld M. Programming is writing is programming. Companion to the First International Conference on the Art, Science and Engineering of Programming; 2017. pp. 1–8. [Google Scholar]
  66. Holmes AP, Friston KJ. Generalisability, random effects & population inference. NeuroImage. 1998;7:S754. doi: 10.1016/S1053-8119(18)31587-8. [DOI] [Google Scholar]
  67. Huang Y, Liu X, Krueger R, Santander T, Hu X, Leach K, Weimer W. Distilling neural representations of data structure manipulation using fMRI and fNIRS. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE); 2019. pp. 396–407. [DOI] [Google Scholar]
  68. Hugdahl K. Hemispheric asymmetry: contributions from brain imaging. Wiley Interdisciplinary Reviews: Cognitive Science. 2011;2:461–478. doi: 10.1002/wcs.122. [DOI] [PubMed] [Google Scholar]
  69. Hugdahl K, Raichle ME, Mitra A, Specht K. On the existence of a generalized non-specific task-dependent network. Frontiers in Human Neuroscience. 2015;9:430. doi: 10.3389/fnhum.2015.00430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Ikutani Y, Kubo T, Nishida S, Hata H, Matsumoto K, Ikeda K, Nishimoto S. Expert programmers have fine-tuned cortical representations of source code. bioRxiv. 2020 doi: 10.1101/2020.01.28.923953. [DOI] [PMC free article] [PubMed]
  71. Ikutani Y, Uwano H. Brain activity measurement during program comprehension with NIRS. 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD); 2014. pp. 1–6. [DOI] [Google Scholar]
  72. Ivanova AA, Mineroff Z, Zimmerer V, Kanwisher N, Varley R, Fedorenko E. The language network is recruited but not required for non-verbal semantic processing. bioRxiv. 2019 doi: 10.1101/696484. [DOI] [PMC free article] [PubMed]
  73. Ivanova A, Srikant S. The Neuroscience of Program Comprehension. swh:1:rev:616e893d05038da620bdf9f2964bd3befba75dc5Software Heritage. 2020 https://archive.softwareheritage.org/swh:1:dir:a7cde799c41db00358ac86bba057cf6d39a38a34;origin=https://github.com/ALFA-group/neural-program-comprehension;visit=swh:1:snp:ad67e98e649825b3b845a2050da8d86d000134cc;anchor=swh:1:rev:616e893d05038da620bdf9f2964bd3befba75dc5/
  74. Jackendoff R. Parallels and nonparallels between language and music. Music Perception. 2009;26:195–204. doi: 10.1525/mp.2009.26.3.195. [DOI] [Google Scholar]
  75. Jacoby N, Bruneau E, Koster-Hale J, Saxe R. Localizing pain matrix and theory of mind networks with both verbal and non-verbal stimuli. NeuroImage. 2016;126:39–48. doi: 10.1016/j.neuroimage.2015.11.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Jacoby N, Fedorenko E. Discourse-level comprehension engages medial frontal theory of mind brain regions even for expository texts. Language, Cognition and Neuroscience. 2020;35:780–796. doi: 10.1080/23273798.2018.1525494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Jouravlev O, Zheng D, Balewski Z, Le Arnz Pongos A, Levan Z, Goldin-Meadow S, Fedorenko E. Speech-accompanying gestures are not processed by the language-processing mechanisms. Neuropsychologia. 2019;132:107132. doi: 10.1016/j.neuropsychologia.2019.107132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Julian JB, Fedorenko E, Webster J, Kanwisher N. An algorithmic method for functionally defining regions of interest in the ventral visual pathway. NeuroImage. 2012;60:2357–2364. doi: 10.1016/j.neuroimage.2012.02.055. [DOI] [PubMed] [Google Scholar]
  79. Kahn HJ, Whitaker HA. Acalculia: an historical review of localization. Brain and Cognition. 1991;17:102–115. doi: 10.1016/0278-2626(91)90071-F. [DOI] [PubMed] [Google Scholar]
  80. Kanwisher N. Functional specificity in the human brain: a window into the functional architecture of the mind. PNAS. 2010;107:11163–11170. doi: 10.1073/pnas.1005062107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Kao E. Exploring computational thinking. [April 13, 2020];Google Research Blog. 2010 https://ai.googleblog.com/2010/10/exploring-computational-thinking.html
  82. Keller TA, Carpenter PA, Just MA. The neural bases of sentence comprehension: a fMRI examination of syntactic and lexical processing. Cerebral Cortex. 2001;11:223–237. doi: 10.1093/cercor/11.3.223. [DOI] [PubMed] [Google Scholar]
  83. Klare GR. Measurement of Readability. Iowa State University Press; 1963. [Google Scholar]
  84. Kroll JF, Dussias PE, Bice K, Perrotti L. Bilingualism, mind, and brain. Annual Review of Linguistics. 2015;1:377–394. doi: 10.1146/annurev-linguist-030514-124937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Krueger R, Huang Y, Liu X, Santander T, Weimer W, Leach K. Neurological divide: an fMRI study of prose and code writing. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE); 2020. pp. 678–690. [DOI] [Google Scholar]
  86. Kuznetsova A, Brockhoff PB, Christensen RHB. lmerTest Package: Tests in linear mixed effects models. Journal of Statistical Software. 2017;82:1–26. doi: 10.18637/jss.v082.i13. [DOI] [Google Scholar]
  87. Lemer C, Dehaene S, Spelke E, Cohen L. Approximate quantities and exact number words: dissociable systems. Neuropsychologia. 2003;41:1942–1958. doi: 10.1016/S0028-3932(03)00123-4. [DOI] [PubMed] [Google Scholar]
  88. Lerdahl F, Jackendoff RS. A Generative Theory of Tonal Music. MIT Press; 1996. [Google Scholar]
  89. Lerner Y, Honey CJ, Silbert LJ, Hasson U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. Journal of Neuroscience. 2011;31:2906–2915. doi: 10.1523/JNEUROSCI.3684-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Liu Y, Kim J, Wilson C, Bedny M. Computer code comprehension shares neural resources with formal logical inference in the fronto-parietal network. eLife. 2020;9:e59340. doi: 10.7554/eLife.59340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Mahowald K, Fedorenko E. Reliable individual-level neural markers of high-level language processing: a necessary precursor for relating neural variability to behavioral and genetic variability. NeuroImage. 2016;139:74–93. doi: 10.1016/j.neuroimage.2016.05.073. [DOI] [PubMed] [Google Scholar]
  92. Mather M, Cacioppo JT, Kanwisher N. How fMRI can inform cognitive theories. Perspectives on Psychological Science. 2013;8:108–113. doi: 10.1177/1745691612469037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. McNamara WJ. The selection of computer personnel: past, present, future. Proceedings of the Fifth SIGCPR Conference on Computer Personnel Research; 1967. pp. 52–56. [Google Scholar]
  94. Michalka SW, Kong L, Rosen ML, Shinn-Cunningham BG, Somers DC. Short-Term memory for space and time flexibly recruit complementary Sensory-Biased frontal lobe attention networks. Neuron. 2015;87:882–892. doi: 10.1016/j.neuron.2015.07.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Micheloyannis S, Sakkalis V, Vourkas M, Stam CJ, Simos PG. Neural networks involved in mathematical thinking: evidence from linear and non-linear analysis of electroencephalographic activity. Neuroscience Letters. 2005;373:212–217. doi: 10.1016/j.neulet.2004.10.005. [DOI] [PubMed] [Google Scholar]
  96. Miller EK, Erickson CA, Desimone R. Neural mechanisms of visual working memory in prefrontal cortex of the macaque. The Journal of Neuroscience. 1996;16:5154–5167. doi: 10.1523/JNEUROSCI.16-16-05154.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Miller EK, Cohen JD. An integrative theory of prefrontal cortex function. Annual Review of Neuroscience. 2001;24:167–202. doi: 10.1146/annurev.neuro.24.1.167. [DOI] [PubMed] [Google Scholar]
  98. Mineroff Z, Blank IA, Mahowald K, Fedorenko E. A robust dissociation among the language, multiple demand, and default mode networks: evidence from inter-region correlations in effect size. Neuropsychologia. 2018;119:501–511. doi: 10.1016/j.neuropsychologia.2018.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Mitchell DJ, Bell AH, Buckley MJ, Mitchell AS, Sallet J, Duncan J. A putative Multiple-Demand system in the macaque brain. The Journal of Neuroscience. 2016;36:8574–8585. doi: 10.1523/JNEUROSCI.0810-16.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Mollica F, Siegelman M, Diachek E, Piantadosi ST, Mineroff Z, Futrell R, Kean H, Qian P, Fedorenko E. Composition is the core driver of the Language-selective network. Neurobiology of Language. 2020;1:104–134. doi: 10.1162/nol_a_00005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Monti MM, Osherson DN, Martinez MJ, Parsons LM. Functional neuroanatomy of deductive inference: a language-independent distributed network. NeuroImage. 2007;37:1005–1016. doi: 10.1016/j.neuroimage.2007.04.069. [DOI] [PubMed] [Google Scholar]
  102. Monti MM, Parsons LM, Osherson DN. The boundaries of language and thought in deductive inference. PNAS. 2009;106:12554–12559. doi: 10.1073/pnas.0902422106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Monti MM, Parsons LM, Osherson DN. Thought beyond language: neural dissociation of algebra and natural language. Psychological Science. 2012;23:914–922. doi: 10.1177/0956797612437427. [DOI] [PubMed] [Google Scholar]
  104. Mukamel R, Fried I. Human intracranial recordings and cognitive neuroscience. Annual Review of Psychology. 2012;63:511–537. doi: 10.1146/annurev-psych-120709-145401. [DOI] [PubMed] [Google Scholar]
  105. Murnane JS. The psychology of computer languages for introductory programming courses. New Ideas in Psychology. 1993;11:213–228. doi: 10.1016/0732-118X(93)90035-C. [DOI] [Google Scholar]
  106. Nakagawa T, Kamei Y, Uwano H, Monden A, Matsumoto K, German DM. Quantifying programmers’ Mental Workload During Program Comprehension Based on Cerebral Blood Flow Measurement: A Controlled Experiment. Companion Proceedings of the 36th International Conference on Software Engineering; 2014. pp. 448–451. [DOI] [Google Scholar]
  107. Nakai T, Yamaguchi HQ, Nishimoto S. Convergence of modality invariance and attention selectivity in the cortical semantic circuit. bioRxiv. 2020 doi: 10.1101/2020.06.19.160960v1. [DOI] [PMC free article] [PubMed]
  108. Nakamura M, Monden A, Itoh T, Matsumoto K, Kanzaki Y, Satoh H. Queue-based cost evaluation of mental simulation process in program comprehension. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717); 2003. pp. 351–360. [DOI] [Google Scholar]
  109. Nieto-Castañón A, Fedorenko E. Subject-specific functional localizers increase sensitivity and functional resolution of multi-subject analyses. NeuroImage. 2012;63:1646–1669. doi: 10.1016/j.neuroimage.2012.06.065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Ormerod T. Human Cognition and Programming. In: Ormerod T, editor. Psychology of Programming. academic press; 1990. pp. 63–82. [Google Scholar]
  111. Papert S. Teaching children to be mathematicians versus teaching about mathematics. International Journal of Mathematical Education in Science and Technology. 1972;3:249–262. doi: 10.1080/0020739700030306. [DOI] [Google Scholar]
  112. Papert SA. Mindstorms: Children, Computers, and Powerful Ideas. 2. New York: Basic Books, Inc; 1993. [Google Scholar]
  113. Parvizi J, Kastner S. Promises and limitations of human intracranial electroencephalography. Nature Neuroscience. 2018;21:474–483. doi: 10.1038/s41593-018-0108-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Paunov AM, Blank IA, Fedorenko E. Functionally distinct language and theory of mind networks are synchronized at rest and during language comprehension. Journal of Neurophysiology. 2019;121:1244–1265. doi: 10.1152/jn.00619.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Pea RD, Kurland DM. On the cognitive effects of learning computer programming. New Ideas in Psychology. 1984;2:137–168. doi: 10.1016/0732-118X(84)90018-7. [DOI] [Google Scholar]
  116. Pennington N, Grabowski B. The Tasks of Programming. Psychology of Programming; 1990. [Google Scholar]
  117. Pereira F, Lou B, Pritchett B, Ritter S, Gershman SJ, Kanwisher N, Botvinick M, Fedorenko E. Toward a universal decoder of linguistic meaning from brain activation. Nature Communications. 2018;9:1–13. doi: 10.1038/s41467-018-03068-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Perkins DN, Simmons R. Patterns of misunderstanding: an integrative model for science, math, and programming. Review of Educational Research. 1988;58:303–326. doi: 10.3102/00346543058003303. [DOI] [Google Scholar]
  119. Pinel P, Piazza M, Le Bihan D, Dehaene S. Distributed and overlapping cerebral representations of number, size, and luminance during comparative judgments. Neuron. 2004;41:983–993. doi: 10.1016/S0896-6273(04)00107-2. [DOI] [PubMed] [Google Scholar]
  120. Pinel P, Dehaene S. Beyond hemispheric dominance: brain regions underlying the joint lateralization of language and arithmetic to the left hemisphere. Journal of Cognitive Neuroscience. 2010;22:48–66. doi: 10.1162/jocn.2009.21184. [DOI] [PubMed] [Google Scholar]
  121. Poldrack RA. Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences. 2006;10:59–63. doi: 10.1016/j.tics.2005.12.004. [DOI] [PubMed] [Google Scholar]
  122. Poldrack RA. Inferring mental states from neuroimaging data: from reverse inference to large-scale decoding. Neuron. 2011;72:692–697. doi: 10.1016/j.neuron.2011.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Prabhakaran V, Smith JA, Desmond JE, Glover GH, Gabrieli JD. Neural substrates of fluid reasoning: an fMRI study of neocortical activation during performance of the raven's Progressive Matrices Test. Cognitive Psychology. 1997;33:43–63. doi: 10.1006/cogp.1997.0659. [DOI] [PubMed] [Google Scholar]
  124. Prat CS, Madhyastha TM, Mottarella MJ, Kuo CH. Relating natural language aptitude to individual differences in learning programming languages. Scientific Reports. 2020;10:1–10. doi: 10.1038/s41598-020-60661-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Pritchett BL, Hoeflin C, Koldewyn K, Dechter E, Fedorenko E. High-level language processing regions are not engaged in action observation or imitation. Journal of Neurophysiology. 2018;120:2555–2570. doi: 10.1152/jn.00222.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Regev M, Honey CJ, Simony E, Hasson U. Selective and invariant neural responses to spoken and written narratives. Journal of Neuroscience. 2013;33:15978–15988. doi: 10.1523/JNEUROSCI.1580-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Reverberi C, Shallice T, D'Agostini S, Skrap M, Bonatti LL. Cortical bases of elementary deductive reasoning: inference, memory, and metadeduction. Neuropsychologia. 2009;47:1107–1116. doi: 10.1016/j.neuropsychologia.2009.01.004. [DOI] [PubMed] [Google Scholar]
  128. Rogalsky C, Rong F, Saberi K, Hickok G. Functional anatomy of language and music perception: temporal and structural factors investigated using functional magnetic resonance imaging. Journal of Neuroscience. 2011;31:3843–3852. doi: 10.1523/JNEUROSCI.4515-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Rosselli M, Ardila A. Calculation deficits in patients with right and left hemisphere damage. Neuropsychologia. 1989;27:607–617. doi: 10.1016/0028-3932(89)90107-3. [DOI] [PubMed] [Google Scholar]
  130. Sanner MF. Python: a programming language for software integration and development. Journal of Molecular Graphics & Modelling. 1999;17:57–61. [PubMed] [Google Scholar]
  131. Saxe R, Brett M, Kanwisher N. Divide and conquer: a defense of functional localizers. NeuroImage. 2006;30:1088–1096. doi: 10.1016/j.neuroimage.2005.12.062. [DOI] [PubMed] [Google Scholar]
  132. Scott TL, Gallée J, Fedorenko E. A new fun and robust version of an fMRI localizer for the frontotemporal language system. Cognitive Neuroscience. 2017;8:167–176. doi: 10.1080/17588928.2016.1201466. [DOI] [PubMed] [Google Scholar]
  133. Shain C, Blank IA, van Schijndel M, Schuler W, Fedorenko E. fMRI reveals language-specific predictive coding during naturalistic sentence comprehension. Neuropsychologia. 2020;138:107307. doi: 10.1016/j.neuropsychologia.2019.107307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Shashidhara S, Mitchell DJ, Erez Y, Duncan J. Progressive recruitment of the frontoparietal Multiple-demand system with increased task complexity, time pressure, and reward. Journal of Cognitive Neuroscience. 2019a;31:1617–1630. doi: 10.1162/jocn_a_01440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  135. Shashidhara S, Spronkers FS, Erez Y. Individual-subject functional localization increases univariate activation but not multivariate pattern discriminability in the ‘multiple-demand’ frontoparietal network. bioRxiv. 2019b doi: 10.1101/661934. [DOI] [PMC free article] [PubMed]
  136. Sheremata SL, Bettencourt KC, Somers DC. Hemispheric asymmetry in visuotopic posterior parietal cortex emerges with visual short-term memory load. Journal of Neuroscience. 2010;30:12581–12588. doi: 10.1523/JNEUROSCI.2689-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  137. Shuman M, Kanwisher N. Numerical magnitude in the human parietal lobe; tests of representational generality and domain specificity. Neuron. 2004;44:557–569. doi: 10.1016/j.neuron.2004.10.008. [DOI] [PubMed] [Google Scholar]
  138. Siegmund J, Kästner C, Apel S, Parnin C, Bethmann A, Leich T, Saake G, Brechmann A. Understanding understanding source code with functional magnetic resonance imaging. Proceedings of the 36th International Conference on Software Engineering; 2014. pp. 378–389. [DOI] [Google Scholar]
  139. Siegmund J, Peitek N, Parnin C, Apel S, Hofmeister J, Kästner C, Begel A, Bethmann A, Brechmann A. Measuring neural efficiency of program comprehension. Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering; 2017. pp. 140–150. [DOI] [Google Scholar]
  140. Simony E, Honey CJ, Chen J, Lositsky O, Yeshurun Y, Wiesel A, Hasson U. Dynamic reconfiguration of the default mode network during narrative comprehension. Nature Communications. 2016;7:1–13. doi: 10.1038/ncomms12141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  141. Sullivan A, Bers MU. Computer science education in early childhood: the case of scratch jr. Journal of Information Technology Education. 2019;18:113–138. doi: 10.28945/4437. [DOI] [Google Scholar]
  142. Tahmasebi AM, Davis MH, Wild CJ, Rodd JM, Hakyemez H, Abolmaesumi P, Johnsrude IS. Is the link between anatomical structure and function equally strong at all cognitive levels of processing? Cerebral Cortex. 2012;22:1593–1603. doi: 10.1093/cercor/bhr205. [DOI] [PubMed] [Google Scholar]
  143. Takayama Y, Sugishita M, Akiguchi I, Kimura J. Isolated acalculia due to left parietal lesion. Archives of Neurology. 1994;51:286–291. doi: 10.1001/archneur.1994.00540150084021. [DOI] [PubMed] [Google Scholar]
  144. Varley RA, Klessinger NJ, Romanowski CA, Siegal M. Agrammatic but numerate. PNAS. 2005;102:3519–3524. doi: 10.1073/pnas.0407470102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  145. Varley R, Siegal M. Evidence for cognition without grammar from causal reasoning and 'theory of mind' in an agrammatic aphasic patient. Current Biology. 2000;10:723–726. doi: 10.1016/S0960-9822(00)00538-8. [DOI] [PubMed] [Google Scholar]
  146. Vázquez-Rodríguez B, Suárez LE, Markello RD, Shafiei G, Paquola C, Hagmann P, van den Heuvel MP, Bernhardt BC, Spreng RN, Misic B. Gradients of structure-function tethering across neocortex. PNAS. 2019;116:21219–21227. doi: 10.1073/pnas.1903403116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  147. Wen T, Duncan J, Mitchell DJ. Representation of task episodes in human cortical networks. bioRxiv. 2019 doi: 10.1101/582858. [DOI]
  148. Wing JM. Computational thinking. Communications of the ACM. 2006;49:33–35. doi: 10.1145/1118178.1118215. [DOI] [Google Scholar]
  149. Wing J. Research notebook: Computational thinking—What and why. The Link Magazine; 2011. [Google Scholar]
  150. Woolgar A, Thompson R, Bor D, Duncan J. Multi-voxel coding of stimuli, rules, and responses in human frontoparietal cortex. NeuroImage. 2011;56:744–752. doi: 10.1016/j.neuroimage.2010.04.035. [DOI] [PubMed] [Google Scholar]
  151. Woolgar A, Duncan J, Manes F, Fedorenko E. Fluid intelligence is supported by the multiple-demand system not the language system. Nature Human Behaviour. 2018;2:200–204. doi: 10.1038/s41562-017-0282-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Andrea E Martin1
Reviewed by: William Matchin2, Ina Bornkessel-Schlesewsky3

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "Comprehension of computer code relies primarily on domain-general executive resources" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: William Matchin (Reviewer #1); Ina Bornkessel-Schlesewsky (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

First, thank you for taking part in the review process.

As you know, eLife is invested in changing scientific publishing and experimenting to embody that change, even if that involves a degree of risk in order to find workable changes. In this spirit, the remit of the co-submission format is to ask if the scientific community is enriched by the data presented in the co-submitted manuscripts together more so than it would be by the papers apart, or if only one paper was presented to the community. In other words, are the conclusions that can be made are stronger or clearer when the manuscripts are considered together rather than separately? We felt that despite significant concerns with each paper individually, especially regarding the theoretical structures in which the experimental results could be interpreted, that this was the case.

We want to be very clear that in a non-co-submission case we would have substantial and serious concerns about the interpretability and robustness of the Liu et al. submission given its small sample size. Furthermore, the reviewers' concerns about the suitability of the control task differed substantially between the manuscripts. We share these concerns. However, despite these differences in control task and sample size, the Liu et al. and Ivanova et al. submissions nonetheless replicated each other – the language network was not implicated in processing programming code. The replication substantially mitigates the concerns shared by us and the reviewers about sample size and control tasks. The fact that different control tasks and sample sizes did not change the overall pattern of results, in our view, is affirmation of the robustness of the findings, and the value that both submissions presented together can offer the literature.

In sum, there were concerns that both submissions were exploratory in nature, lacking a strong theoretical focus, and relied on functional localizers on novel tasks. However, these concerns were mitigated by the following strengths. Both tasks ask a clear and interesting question. The results replicate each other despite task differences. In this way, the two papers strengthen each other. Specifically, the major concerns for each paper individually are ameliorated when considering them as a whole.

In your revisions, please address the concerns of the reviewers, including, specifically, the limits of interpretation of your results with regard to control task choice, the discussion of relevant literature mentioned by the reviewers, and most crucially, please contextualize your results with regard to the other submission's results.

Reviewer #1:

The manuscript is well-written and the methods are clear and rigorous, representing a clear advance on previous research comparing computer code programming to language. The conclusions with respect to which brain networks computer programming activates are compelling and well conveyed. This paper is useful to the extent that the conclusions are focused on the empirical findings: whether or not code activates language-related brain regions (answer: no). However, the authors appear to be also testing whether or not any of the mechanisms involved in language are recruited for computer programming. The problem with this goal is that the authors do not present or review a theory of the representations and mechanisms involved in computer programming, as has been developed for language (e.g. Adger, 2013; Bresnan, 2001; Chomsky, 1965, 1981, 1995; Goldberg, 1995; Hornstein, 2009; Jackendoff, 2002; Levelt, 1989; Lewis and Vasishth, 2005; Vosse and Kempen, 2000).

1) "The fact that coding can be learned in adulthood suggests that it may rely on existing cognitive systems." "Finally, code comprehension may rely on the system that supports comprehension of natural languages: to successfully process both natural and computer languages, we need to access stored meanings of words/tokens and combine them using hierarchical syntactic rules (Fedorenko et al., 2019; Murnane, 1993; Papert, 1993) – a similarity that, in theory, should make the language circuits well-suited for processing computer code." If we understand stored elements and computational structure in the broadest way possible without breaking this down more, many domains of cognition would be shared in this way. The authors should illustrate in more detail how the psychological structure of computer programming parallels language. Is there an example of hierarchical structure in computer code? What is the meaning of a variable/function in code, and how does this compare to meaning in language?

2) "Our findings, along with prior findings from math and logic (Amalric and Dehaene, 2019; Monti et al., 2009, 2012), argue against this possibility: the language system does not respond to meaningful structured input that is non-linguistic." This is an overly simple characterization of the word "meaningful". The meaning of math and logic are not the same as in language. Both mathematics and computer programming have logical structure to them, but the nature of this structure and the elements that are combined in language are different. Linguistic computations take as input complex atoms of computation that have phonological and conceptual properties. These atoms are commonly used to refer to entities "in the world" with complex semantic properties and often have rich associated imagery. Linguistic computations output complex, monotonically enhanced forms. So cute + dogs = cute dogs, chased + cute dogs = chased cute dogs, etc. This is very much unlike mathematics and computer programming, where we typically do not make reference to the "real world" using these expressions to interlocuters, and outputs of an expression are not monotonic, structure-preserving combinations of the input elements, and there is no semantic enhancement that occurs through increased computation. This bears much more discussion in the paper, if the authors intend to make claims regarding shared/distinct computations between computer programming and language.

3) More importantly, even if there were shared mechanisms between computer code programming and language, I'm not sure we can use reverse inference to strongly test this hypothesis. As Poldrack, 2006, pointed out, reverse inference is sharply limited by the extent to which we know how cognition maps onto the brain. This is a similar point to Poeppel and Embick, 2005, who pointed out that different mechanisms of language could be implemented in the brain in a large variety of ways, only one of which is big pieces of cortical tissue. In this sense, there could in fact be shared mechanisms between language and code (e.g. oscillatory dynamics, connectivity patterns, subcortical structures), but these mechanisms might not be aligned with the cortical territory associated with language-related brain regions. The authors should spend much additional time discussing these alternative possibilities.

Reviewer #2:

This carefully designed fMRI study examines an interesting question, namely how computer code – as a "cognitive/cultural invention" – is processed by the human brain. The study has a number of strengths, including: use of two very different programming languages (Python and Scratch Jr.) in two experiments; direct comparison between code problems and "content-matched sentence problems" to disentangle code comprehension from problem content; control for the impact of lexical information in code passages by replacing variable names with Japanese translations; and consideration of inter-individual differences in programming proficiency. I do, however, have some questions regarding the interpretation of the results in mechanistic terms, as detailed below.

1) Code comprehension versus underlying problem content

I am generally somewhat sceptical in regard to the use of functional localisers in view of the assumptions that necessarily enter into the definition of a localiser task. In addition, an overlap between the networks supporting two different tasks doesn't imply comparable neural processing mechanisms. With the present study, however, I was impressed by the authors' overall methodological approach. In particular, I found the supplementation of the localiser-based approach with the comparison between code problems and analogous sentence problems rather convincing.

However, while I agree that computational thinking does not require coding / code comprehension, it is less clear to me what code comprehension involves when it is stripped of the computational thinking aspect. Knowing how to approach a problem algorithmically strikes me as a central aspect of coding. What, then, is being measured by the code problem versus sentence problem comparison? Knowledge of how to implement a certain computational solution within a particular programming language? The authors touch upon this briefly in the Discussion section of the paper, but I am not fully convinced by their arguments. Specifically, they state:

"The process of code comprehension includes retrieving code-related knowledge from memory and applying it to the problems at hand. This application of task-relevant knowledge plausibly requires attention, working memory, inhibitory control, planning, and general flexible reasoning-cognitive processes long linked to the MD system […].”

Shouldn't all of this also apply (or even apply more strongly) to processing of the underlying problem content rather than to code comprehension per se?

According to the authors, the extent to which code-comprehension-related activity reflects problem content varies between different systems. They conclude that "MD responses to code […] do not exclusively reflect responses to problem content", while they argue on the basis of their voxel-wise correlation analysis that "the language system's response to code is largely (although not completely) driven by problem content. However, unless I have missed something, the latter analysis was only undertaken for the language system but not for the other systems under examination. Was there a particular reason for this? Also, what are the implications of observing problem content-driven responses within the language system for the authors' conclusion that this system is "functionally conservative"?

Overall, the paper would be strengthened by more clarity in regard to these issues – and specifically a more detailed discussion of what code comprehension may amount to in mechanistic terms when it is stripped of computational thinking.

2) Implications of using reading for the language localiser task

Given that reading is also a cultural invention, is it really fair to say that coding is being compared to the "language system" here rather than to the "reading system" (in view of the visual presentation of the language task)? The possible implications of this for the interpretation of the data should be considered.

3) Possible effects of verbalisation?

It appears possible that participants may have internally verbalised code problems – at least to a certain extent (and likely with a considerable degree of inter-individual variability). How might this have affected the results of the present study? Could verbalisation be related to the highly correlated response between code problems and language problems within the language system?

eLife. 2020 Dec 15;9:e58906. doi: 10.7554/eLife.58906.sa2

Author response


[…] In your revisions, please address the concerns of the reviewers, including, specifically, the limits of interpretation of your results with regard to control task choice, the discussion of relevant literature mentioned by the reviewers, and most crucially, please contextualize your results with regard to the other submission's results.

We thank the editor and the reviewers for their thoughtful evaluation. We have now addressed the reviewers’ comments (see below) and added a paragraph comparing our results with those of Liu et al.:

“The results of our work align well with the results of another recent study on program comprehension (Liu et al., 2020). […] This is precisely what Liu et al. find. Further, similar to us, Liu et al. conclude that it is the MD regions, not the language regions, that are primarily involved in program comprehension.”

We also added a reference to Liu et al.’s work in the Introduction:

“However, none of these prior studies sought to explicitly distinguish code comprehension from other programming-related processes, and none of them provide quantitative evaluations of putative shared responses to code and other tasks, such as working memory, math, or language (cf. Liu et al., 2020; see Discussion).”

Additionally, we have adjusted the statistical results in order to fix two minor bugs in our code (accidentally excluding results from one participant and using dummy coding instead of sum coding for Hemisphere when analyzing MD system activity). The resulting changes were numerically small and did not affect any of our conclusions. We also formatted supplementary figure/file references in accordance with the journal’s requirements.

Reviewer #1:

The manuscript is well-written and the methods are clear and rigorous, representing a clear advance on previous research comparing computer code programming to language. The conclusions with respect to which brain networks computer programming activates are compelling and well conveyed. This paper is useful to the extent that the conclusions are focused on the empirical findings: whether or not code activates language-related brain regions (answer: no). However, the authors appear to be also testing whether or not any of the mechanisms involved in language are recruited for computer programming. The problem with this goal is that the authors do not present or review a theory of the representations and mechanisms involved in computer programming, as has been developed for language (e.g. Adger, 2013; Bresnan, 2001; Chomsky, 1965, 1981, 1995; Goldberg, 1995; Hornstein, 2009; Jackendoff, 2002; Levelt, 1989; Lewis and Vasishth, 2005; Vosse and Kempen, 2000).

Thank you for the positive evaluation! We agree that the main value of our paper is examining the contributions of the two brain networks to computer programming. Unfortunately, the theories of representations/computations involved in computer code comprehension are quite underdeveloped and have very little experimental support. Thus, the main mechanistic distinction we investigate in this work is a high-level distinction between code comprehension and the processing of problem content. We have adjusted the framing and the title of our manuscript to tone down and/or clarify our theoretical stance (see responses to specific points below).

1) "The fact that coding can be learned in adulthood suggests that it may rely on existing cognitive systems." "Finally, code comprehension may rely on the system that supports comprehension of natural languages: to successfully process both natural and computer languages, we need to access stored meanings of words/tokens and combine them using hierarchical syntactic rules (Fedorenko et al., 2019; Murnane, 1993; Papert, 1993) – a similarity that, in theory, should make the language circuits well-suited for processing computer code." If we understand stored elements and computational structure in the broadest way possible without breaking this down more, many domains of cognition would be shared in this way. The authors should illustrate in more detail how the psychological structure of computer programming parallels language. Is there an example of hierarchical structure in computer code? What is the meaning of a variable/function in code, and how does this compare to meaning in language?

We have modified the sections to further illustrate potential structural similarities between language and code:

“Finally, code comprehension may rely on the system that supports comprehension of natural languages (Fedorenko et al., 2019; Murnane, 1993; Papert, 1993). Like language, computer code makes heavy use of hierarchical structures (e.g., loops, conditionals, and recursive statements), and, like language, it can convey an unlimited amount of meaningful information (e.g., describing objects or action sequences). These similarities could, in principle, make the language circuits well-suited for processing computer code.”

“Such accounts predict that the mechanisms supporting structure processing in language should also get engaged when we process structure in other domains, including computer code. […] Our finding builds upon these results to show that compositional input (here, variables and keywords combining into statements) and hierarchical structure (here, conditional statements and loops) do not necessarily engage language-specific regions.”

We have also added more details about meaning in language vs. code (see point 2).

2) "Our findings, along with prior findings from math and logic (Amalric and Dehaene, 2019; Monti et al., 2009, 2012), argue against this possibility: the language system does not respond to meaningful structured input that is non-linguistic." This is an overly simple characterization of the word "meaningful". The meaning of math and logic are not the same as in language. Both mathematics and computer programming have logical structure to them, but the nature of this structure and the elements that are combined in language are different. Linguistic computations take as input complex atoms of computation that have phonological and conceptual properties. These atoms are commonly used to refer to entities "in the world" with complex semantic properties and often have rich associated imagery. Linguistic computations output complex, monotonically enhanced forms. So cute + dogs = cute dogs, chased + cute dogs = chased cute dogs, etc. This is very much unlike mathematics and computer programming, where we typically do not make reference to the "real world" using these expressions to interlocuters, and outputs of an expression are not monotonic, structure-preserving combinations of the input elements, and there is no semantic enhancement that occurs through increased computation. This bears much more discussion in the paper, if the authors intend to make claims regarding shared/distinct computations between computer programming and language.

We thank the reviewer for raising this interesting and important point. We agree that it is important to clarify what we mean by “meaning”. We therefore significantly expanded the section in question to clarify our statement:

“Another similarity shared by computer programming and natural language is the use of symbols – units referring to concepts “out in the world”. […] And yet the communicative nature of this activity is not sufficient to recruit the language system, consistent with previous reports showing a neural dissociation between language and gesture processing (Jouravlev et al., 2019), intentional actions (Pritchett et al., 2018) or theory of mind tasks (Apperly et al., 2006; Dronkers et al., 1998; Jacoby et al., 2016; Paunov et al., 2019; Varley and Siegal, 2000).”

Note that this section addresses the comments about reference to external entities and the communicative intent (which, as we state above, we think are partially shared between language and code). The paragraph preceding these (see point 1) also addresses the point about compositionality, at least in its most basic definition. We agree that language has both semantic and compositional properties that make it “special” (rich associations, imagery, semantic enhancement), all of which might account for the high functional selectivity of the language regions. However, we think that the discussion of such possibly unique properties of language is outside the scope of this paper and therefore limit ourselves to listing the putative shared properties of language and code.

3) More importantly, even if there were shared mechanisms between computer code programming and language, I'm not sure we can use reverse inference to strongly test this hypothesis. As Poldrack, 2006, pointed out, reverse inference is sharply limited by the extent to which we know how cognition maps onto the brain. This is a similar point to Poeppel and Embick, 2005, who pointed out that different mechanisms of language could be implemented in the brain in a large variety of ways, only one of which is big pieces of cortical tissue. In this sense, there could in fact be shared mechanisms between language and code (e.g. oscillatory dynamics, connectivity patterns, subcortical structures), but these mechanisms might not be aligned with the cortical territory associated with language-related brain regions. The authors should spend much additional time discussing these alternative possibilities.

The reviewer is right in that we here focus on one way in which two cognitive functions might share resources (through cortical overlap; though note that our whole-brain analyses rule out overlap at the subcortical level, too). It is absolutely true that functionally distinct cortical areas could still (a) implement similar computations, and/or (b) interact, e.g., via oscillatory dynamics. We have addressed this point in a separate paragraph.

“Of course, the lack of consistent language system engagement in code comprehension does not mean that the mechanisms underlying language and code processing are completely different. […] However, the fact that we observed code-evoked activity primarily in the MD regions indicates that code comprehension does not load on the same neural circuits as language and needs to use domain-general MD circuits instead.”

Reviewer #2:

[…] I do, however, have some questions regarding the interpretation of the results in mechanistic terms, as detailed below.

1) Code comprehension versus underlying problem content

I am generally somewhat sceptical in regard to the use of functional localisers in view of the assumptions that necessarily enter into the definition of a localiser task. In addition, an overlap between the networks supporting two different tasks doesn't imply comparable neural processing mechanisms. With the present study, however, I was impressed by the authors' overall methodological approach. In particular, I found the supplementation of the localiser-based approach with the comparison between code problems and analogous sentence problems rather convincing.

However, while I agree that computational thinking does not require coding / code comprehension, it is less clear to me what code comprehension involves when it is stripped of the computational thinking aspect. Knowing how to approach a problem algorithmically strikes me as a central aspect of coding. What, then, is being measured by the code problem versus sentence problem comparison? Knowledge of how to implement a certain computational solution within a particular programming language?

The reviewer is right – it will be helpful to provide further clarifications of what it means to process problem content vs. process code in the Discussion section (in addition to the definition we provide in the Introduction). We address the specific points below. Additionally, prompted by the comments from reviewer 1, we added some more information about the putative cognitive processes underlying code comprehension and their possible parallels with language processing (see reviewer 1, points 1 and 2).

The authors touch upon this briefly in the Discussion section of the paper, but I am not fully convinced by their arguments. Specifically, they state:

"The process of code comprehension includes retrieving code-related knowledge from memory and applying it to the problems at hand. This application of task-relevant knowledge plausibly requires attention, working memory, inhibitory control, planning, and general flexible reasoning-cognitive processes long linked to the MD system […].

Shouldn't all of this also apply (or even apply more strongly) to processing of the underlying problem content rather than to code comprehension per se?

This paragraph indeed applies to both processes. We have adjusted it below, including examples of processing code vs. problem content:

“We found that responses in the MD system were driven both by the processing of problem content (e.g., summing the contents of an array) and by code comprehension (e.g., identifying variables referring to an array and its elements, interpreting a for-loop, realizing that the output of the program is the variable being updated inside the for-loop). […] The overlap was observed within brain regions whose topography grossly resembles that of the MD system.”

We also added a sentence about processes specific to code comprehension to the next paragraph:

“Furthermore, given that no regions outside of the MD system showed code-specific responses, it must be the case that code-specific knowledge representations are also stored within this system (see Hasson et al., 2015, for a general discussion of the lack of distinction between storage and computing resources in the brain). Such code-specific representations would likely include both knowledge specific to a programming language (e.g. the syntax marking an array in Java vs. Python) and knowledge of programming concepts that are shared across languages (e.g. for loops).”

According to the authors, the extent to which code-comprehension-related activity reflects problem content varies between different systems. They conclude that "MD responses to code […] do not exclusively reflect responses to problem content", while they argue on the basis of their voxel-wise correlation analysis that "the language system's response to code is largely (although not completely) driven by problem content. However, unless I have missed something, the latter analysis was only undertaken for the language system but not for the other systems under examination. Was there a particular reason for this?

The reason is that, for the MD system, we can make this inference based on the univariate analyses alone, but for the language system we cannot. To clarify this point, we have added an explanation to the section motivating the spatial correlation analysis and clarified our conclusion to highlight that both MD and language systems are sensitive to problem content. Finally, for completeness, we have also added a supplementary figure showing the spatial correlation plot for the MD system.

“Finally, we investigated whether the responses to Python code problems within the language system were driven by code comprehension specifically or rather by the underlying problem content. […] Thus, in both MD and language systems, response to Python code is driven both by problem content and by code-specific responses.

Overall, we found that the language system responded to code problems written in Python but not in ScratchJr. Furthermore, Python responses were driven not only by code comprehension, but also by the processing of problem content. We conclude that successful comprehension of computer code can proceed without engaging the language network.”

Also, what are the implications of observing problem content-driven responses within the language system for the authors' conclusion that this system is "functionally conservative"?

We thank the reviewer for drawing our attention to the fact that we do not explicitly discuss the possible explanations of the language system’s responses to Python code. We have added a paragraph to the Discussion section to fill this gap. It mentions both the content-driven responses and possible verbalization confounds (addressing point 3):

“More work is required to determine why the language system showed some activity in response to Python code. […] Further investigations of the role of the language system in computational thinking have the potential to shed light on the exact computations supported by these regions.”

Overall, the paper would be strengthened by more clarity in regard to these issues – and specifically a more detailed discussion of what code comprehension may amount to in mechanistic terms when it is stripped of computational thinking.

2) Implications of using reading for the language localiser task

Given that reading is also a cultural invention, is it really fair to say that coding is being compared to the "language system" here rather than to the "reading system" (in view of the visual presentation of the language task)? The possible implications of this for the interpretation of the data should be considered.

We believe that we are examining the language system rather than the reading system because (a) we are restricting our analysis to left hemisphere fronto-temporal parcels that encompass language-related activity (cf. the ventral visual areas that specifically support reading; Baker et al., 2007), (b) we are defining the fROIs based on the sentences vs. pronounceable nonwords contrast (both of these conditions involve reading, but only sentences have linguistic structure + meaning). Moreover, these regions have been previously shown to respond to both spoken and written language (Deniz et al., 2019; Fedorenko et al., 2010; Nakai et al., 2020; Regev et al., 2013; Scott et al., 2017), and damage to these regions leads to linguistic difficulties in both reading and listening (for comprehension), and writing and speaking (for production).

To clarify, we have added more information about the language system in the main text:

“These regions respond robustly to linguistic input, both visual and auditory (Deniz et al., 2019; Fedorenko et al., 2010; Nakai et al., 2020; Regev et al., 2013; Scott et al., 2017). However, they show little or no response to tasks in non-linguistic domains”

“The sentences > nonword-lists contrast isolates processes related to language comprehension (responses evoked by, e.g., visual perception and reading are subtracted out) and has been previously shown to reliably activate left-lateralized fronto-temporal language processing regions, be robust to changes in task and materials, and activate the same regions regardless of whether the materials were presented visually or auditorily (Fedorenko et al., 2010; Mahowald and Fedorenko, 2016; Scott et al., 2017).”

3) Possible effects of verbalisation?

It appears possible that participants may have internally verbalised code problems – at least to a certain extent (and likely with a considerable degree of inter-individual variability). How might this have affected the results of the present study? Could verbalisation be related to the highly correlated response between code problems and language problems within the language system?

Internal verbalization is indeed an important confound to keep in mind when trying to dissociate the neural correlates of language and other cognitive functions. It is not a major threat to claims that the language system is not engaged in code comprehension, so our main conclusion is not affected by this potential confound. However, the reviewer is right in pointing out that verbalization may underlie the language system’s responses to Python code. While this is possible, we think such an explanation is unlikely, since ScratchJr problems would have been even easier to verbalize, and yet they do not evoke language responses. We have included a section addressing verbalization in the Discussion; this text is also included in the response to point (1).

“More work is required to determine why the language system showed some activity in response to Python code. […] It is also inconsistent with observations that even behaviors that ostensibly require subvocal rehearsal (e.g., mathematical operations) do not engage the language system (see e.g., Amalric and Dehaene, 2019; Fedorenko et al., 2011).”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Ivanova AA, Srikant S, Sueoka Y, Kean HH, Dhamala R, O'Reilly U-M, Bers MU, Fedorenko E. 2020. Comprehension of computer code relies primarily on domain-general executive resources. Open Science Framework. 10.17605/OSF.IO/9JFN5 [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Supplementary file 1. Statistical analysis of functional ROIs in the multiple demand system.

    Table 1 – Experiment 1 (Python); Table 2 – Experiment 2 (ScratchJr).

    elife-58906-supp1.docx (50.1KB, docx)
    Transparent reporting form

    Data Availability Statement

    Materials used for the programming tasks, fROI responses in individual participants (used for generating Figures 2-4), behavioral data, and analysis code files are available on the paper's website https://github.com/ALFA-group/neural-program-comprehension (copy archived at https://archive.softwareheritage.org/swh:1:rev:616e893d05038da620bdf9f2964bd3befba75dc5/). Whole brain activation maps are available at https://osf.io/9jfn5/.

    The following dataset was generated:

    Ivanova AA, Srikant S, Sueoka Y, Kean HH, Dhamala R, O'Reilly U-M, Bers MU, Fedorenko E. 2020. Comprehension of computer code relies primarily on domain-general executive resources. Open Science Framework. 10.17605/OSF.IO/9JFN5


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES