A deep learning framework for neuroscience

Blake A Richards; Timothy P Lillicrap; Philippe Beaudoin; Yoshua Bengio; Rafal Bogacz; Amelia Christensen; Claudia Clopath; Rui Ponte Costa; Archy de Berker; Surya Ganguli; Colleen J Gillon; Danijar Hafner; Adam Kepecs; Nikolaus Kriegeskorte; Peter Latham; Grace W Lindsay; Ken Miller; Richard Naud; Christopher C Pack; Panayiota Poirazi; Pieter Roelfsema; João Sacramento; Andrew Saxe; Benjamin Scellier; Anna Schapiro; Walter Senn; Greg Wayne; Daniel Yamins; Friedemann Zenke; Joel Zylberberg; Denis Therien; Konrad P Kording

doi:10.1038/s41593-019-0520-2

. Author manuscript; available in PMC: 2020 Aug 12.

Published in final edited form as: Nat Neurosci. 2019 Oct 28;22(11):1761–1770. doi: 10.1038/s41593-019-0520-2

A deep learning framework for neuroscience

Blake A Richards ^1,^2,^3,^4,^#, Timothy P Lillicrap ^5,^6,^#, Philippe Beaudoin ⁷, Yoshua Bengio ^1,^4,⁸, Rafal Bogacz ⁹, Amelia Christensen ¹⁰, Claudia Clopath ¹¹, Rui Ponte Costa ^12,¹³, Archy de Berker ⁷, Surya Ganguli ^14,¹⁵, Colleen J Gillon ¹⁶, Danijar Hafner ^15,^18,¹⁹, Adam Kepecs ²⁰, Nikolaus Kriegeskorte ^21,²², Peter Latham ²², Grace W Lindsay ^22,²⁴, Ken Miller ^22,^24,²⁵, Richard Naud ^26,²⁷, Christopher C Pack ³, Panayiota Poirazi ²⁸, Pieter Roelfsema ²⁹, João Sacramento ³⁰, Andrew Saxe ³¹, Benjamin Scellier ^1,⁸, Anna Schapiro ³², Walter Senn ¹³, Greg Wayne ⁵, Daniel Yamins ^33,^34,³⁵, Friedemann Zenke ^36,³⁷, Joel Zylberberg ^4,^38,³⁹, Denis Therien ^7,^#, Konrad P Kording ^4,^40,^41,^#

¹Mila, Montréal, QC, Canada

²School of Computer Science, McGill University, Montréal, QC, Canada

³Department of Neurology & Neurosurgery, McGill University, Montréal, QC, Canada

⁴Canadian Institute for Advanced Research, Toronto, ON, Canada

⁵DeepMind, Inc., London, UK

⁶Centre for Computation, Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, London, UK

⁷Element AI, Montréal, QC, Canada

⁸Université de Montréal, Montréal, QC, Canada

⁹MRC Brain Network Dynamics Unit, University of Oxford, Oxford, UK

¹⁰Department of Electrical Engineering, Stanford University, Stanford, CA, USA

¹¹Department of Bioengineering, Imperial College London. UK

¹²Computational Neuroscience Unit, School of Computer Science, Electrical and Electronic Engineering, and Engineering Maths, University of Bristol, Bristol, UK

¹³Department of Physiology, Universität Bern, Bern, Switzerland

¹⁴Department of Applied Physics, Stanford University, Stanford, CA, USA

¹⁵Google Brain, Mountain View, CA, USA

¹⁶Department of Biological Sciences, University of Toronto Scarborough, Toronto, ON, Canada

¹⁷Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada

¹⁸Department of Computer Science, University of Toronto, Toronto, ON, Canada

¹⁹Vector Institute, Toronto, ON, Canada

²⁰Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA

²¹Department of Psychology and Neuroscience, Columbia University, New York, NY, USA

²²Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA

²³Gatsby Computational Neuroscience Unit, University College London, London, UK

²⁴Center for Theoretical Neuroscience, Columbia University, New York, NY, USA

²⁵Department of Neuroscience, College of Physicians and Surgeons, Columbia University. New York, NY, USA

²⁶University of Ottawa Brain and Mind Institute, Ottawa, ON, Canada

²⁷Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON, Canada

²⁸Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology-Hellas (FORTH), Crete, Greece

²⁹Department of Vision & Cognition, Netherlands Institute for Neuroscience, Amsterdam, The Netherlands

³⁰Institute of Neuroinformatics, ETH Zürich and University of Zürich, Zürich, Switzerland

³¹Department of Experimental Psychology, University of Oxford, Oxford, UK

³²Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA

³³Department of Psychology, Stanford University, Stanford, CA, USA

³⁴Department of Computer Science, Stanford University, Stanford, CA, USA

³⁵Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA

³⁶Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland

³⁷Centre for Neural Circuits and Behaviour, University of Oxford, Oxford, UK

³⁸Department of Physics and Astronomy York University, Toronto, ON, Canada

³⁹Center for Vision Research, York University, Toronto, ON, Canada

⁴⁰Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA

⁴¹Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA

Contributed equally.

PMCID: PMC7115933 EMSID: EMS88501 PMID: 31659335

Abstract

Systems neuroscience seeks explanations for how the brain implements a wide variety of perceptual, cognitive and motor tasks. Conversely, artificial intelligence attempts to design computational systems based on the tasks they will have to solve. In the case of artificial neural networks, the three components specified by design are the objective functions, the learning rules, and architectures. With the growing success of deep learning, which utilizes brain-inspired architectures, these three designed components have increasingly become central to how we model, engineer and optimize complex artificial learning systems. Here we argue that a greater focus on these components would also benefit systems neuroscience. We give examples of how this optimization-based framework can drive theoretical and experimental progress in neuroscience. We contend that this principled perspective on systems neuroscience will help to generate more rapid progress.

Introduction

Major technical advances are revolutionizing our ability to observe and manipulate brains at a large-scale and quantify complex behaviors^1,2. How should we use this data to develop models of the brain? When the classical framework for systems neuroscience was developed, we could only record from small sets of neurons. In this framework, a researcher observes neural activity, develops a theory of what individual neurons compute, then assembles a circuit-level theory of how the neurons combine their operations. This approach has worked well for simple computations. For example, we know how central pattern generators control rhythmic movements³, how the vestibulo-ocular reflex promotes gaze stabilization⁴, and how the retina computes motion⁵. But, can this classical framework scale up to recordings of thousands of neurons and all of the behaviors that we may wish to account for? Arguably, we have not had as much success with the classical approach in large neural circuits that perform a multitude of functions, like the neocortex or hippocampus. In such circuits, researchers often find neurons with response properties that are difficult to summarize in a succinct manner^6,7.

The limitations of the classical framework suggest that new approaches are needed to take advantage of experimental advances. A promising framework is emerging from the interactions between neuroscience and Artificial Intelligence (AI)^8–10. The rise of deep learning as a leading machine learning method invites us to revisit Artificial Neural Networks (ANNs). At their core, ANNs model neural computation using simplified units that loosely mimic the integration and activation properties of real neurons¹¹. Units are implemented with varying degrees of abstraction, ranging from highly simplified linear operations to relatively complex models with multiple compartments, spikes, etc.^11–14. Importantly, the specific computations performed by ANNs are not designed, but learned¹⁵.

However, human design still plays a role in determining three essential components in ANNs: the learning goal, expressed as an objective function (or loss function) to be maximized or minimized; a set of learning rules, expressed as synaptic weight updates; and the network architecture, expressed as the pathways and connections for information flow (Fig. 1)¹⁵. Within this framework, we do not seek to summarize how a computation is performed, but we do summarize what objective functions, learning rules and architectures would enable learning of that computation.

When designing ANNs, researchers do not craft the specific computations performed by the network. Instead they specify these three components. Objective functions quantify the performance of the network on a task, and learning involves finding synaptic weights that maximize or minimize the objective function. (Often, these are referred to as “loss” or “cost” functions.) Learning rules provide a recipe for updating the synaptic weights. This can lead to ascent of the objective, even if the explicit gradient of the objective function isn’t followed. Architectures specify the arrangement of units in the network, and determine the flow of information, as well as the computations that are or are not possible for the network to learn.

Deep learning can be seen as a rebranding of long-standing ANN ideas¹¹. Deep ANNs possess multiple layers, either feedforward, or recurrent over time. The “layers” are best thought of as being analogous to brain regions, rather than as specific laminae in biological brains^16,17. “Deep” learning specifically refers to training hierarchical ANNs in an end-to-end manner, such that plasticity in each layer of the hierarchy contributes to the learning goals¹⁵, which requires a solution to the “credit assignment problem” (Box 1)^18,19. In recent years, progress in deep learning has come from the use of bigger ANNs, trained with bigger datasets using Graphics Processing Units (GPUs) that can efficiently handle the required computations. Such developments have produced solutions for many new problems, including image²⁰ and speech²¹ classification and generation, language processing and translation²², haptics and grasping²³, navigation²⁴, sensory prediction²⁵, game playing²⁶ and reasoning²⁷.

Many recent findings suggest that deep learning can inform our theories of the brain. First, it has been shown that deep ANNs can, in some cases closely, mimic the representational transformations in primate perceptual systems^17,28, and thereby can be leveraged to manipulate neural activity²⁹. Second, many well-known behavioral and neurophysiological phenomena, including grid cells²⁴, shape tuning³⁰, temporal receptive fields³¹, visual illusions³², and apparent model-based reasoning³³, have been shown to emerge in deep ANNs trained on tasks similar to those solved by animals. Third, many modeling studies have demonstrated that the apparent biological implausibility of end-to-end learning rules, e.g. learning algorithms that can mimic the power of the canonical backpropagation-of-error algorithm (backprop) (see Box 1), is overstated. Relatively simple assumptions about cellular and subcellular electrophysiology, inhibitory microcircuits, patterns of spike timing, short term plasticity, and feedback connections can enable biological systems to approximate backprop-like learning in deep ANNs^{12,14,34–39}. Hence, ANN-based models of the brain may not be as unrealistic as previously thought, and simultaneously, they appear to explain a lot of neurobiological data.

With these developments, it is the right time to consider a deep-learning-inspired framework for systems neuroscience^8,19,40. We have a growing understanding of the key principles that underlie ANNs, and there are theoretical reasons to believe that these insights apply generally^41,42. Concomitantly, our ability to monitor and manipulate large neural populations opens the door to new ways of testing hypotheses derived from the deep learning literature. Here we sketch the scaffolding of a deep learning framework for modern systems neuroscience.

Box 1. Learning and the “credit assignment problem”.

A natural definition of learning is that it is a change to a system that improves its performance. Suppose we have an objective function, F(W), which measures how well a system is currently performing, given the N-dimensional vector of its current synaptic weights, W. If the synaptic weights change from W to W + ΔW, then the change in performance is ΔF = F(W + ΔW)−F(W). If we make small changes to W, and F is locally smooth, then ΔF, is given approximately by

ΔF ≈ ΔW^T ⋅ ∇_W F

where ∇_W F is the gradient of F with respect to W⁴¹. Suppose we want to guarantee improved performance, i.e. we want to ensure ΔF > 0. We know that there is an N-1 dimensional manifold of local changes in W that all lead to the same improvement. Which one should we choose? Gradient-based algorithms derive from the intuition that we want to take the smallest step that gets us a specific level of improvement. If we choose a small step size, η, times the gradient ∇_W F, then we will improve as much as possible for that step size. Thus, we have:

ΔF ≈ η ∇_W F^T ⋅ ∇_W F > 0

In other words, the objective function value increases with every step (when η is small) according to the length of the gradient vector.

The concept of “credit assignment” refers to the problem of determining how much “credit” or “blame” a given neuron or synapse should get for a given outcome. More specifically, it is a way of determining how each parameter in the system (e.g., each synaptic weight) should change to ensure that ΔF > 0. In its simplest form, the “credit assignment problem” refers to the difficulty of assigning credit in complex networks. Updating weights using the gradient of the objective function, ∇_W F, has proven to be an excellent means of solving the credit assignment problem in ANNs. A question that systems neuroscience faces is whether the brain also approximates something like gradient-based methods.

The most common method for calculating gradients in deep ANNs is backprop¹⁵. It uses the chain rule to recursively calculate gradients backwards from the output¹¹. But backprop rests on biologically implausible assumptions, such as symmetric feedback weights and distinct forward and backward passes of information¹⁴. Many different learning algorithms, not just backprop, can provide estimates of a gradient, and some of these do not suffer from backprop’s biological implausibility^{12,14,34–38,91–93}. However, algorithms differ in their variance and bias properties (Fig. 2)^36,94. Algorithms such as weight/node perturbation, which reinforce random changes in synaptic weights through rewards, have high variance in their path along the gradient⁹⁴. Algorithms that use random feedback weights to communicate gradient information have high bias^36,95. Various proposals have been made to minimize bias and variance in algorithms while maintaining their biological realism^37,38.

Many learning rules provide an estimate of the gradient of an objective function, even if they are not explicitly gradient-based. However, as with any estimator, these learning rules can exhibit different degrees of variance and bias in their estimates of the gradient. Here, we provide a rough illustration of how much bias and variance some of the proposed biologically plausible learning rules may have relative to backprop. It is important to note that the exact bias and variance properties of many of the learning rules are unknown, and this is just a sketch. As such, for some of the learning rules shown here, e.g. contrastive Hebbian learning, predictive coding (ref. 35), dendritic error learning (ref. 14), regression discontinuity design (RDD) (ref. 93), and attention-gated reinforcement learning (AGREL) (ref. 37), we have indicated their location with a question mark. For others, namely backpropagation, feedback alignment (ref. 36), and node/weight perturbation (ref. 94), we show their known relative positions.

Constraining learning in artificial neural networks and the brain with “task sets”

The “No Free Lunch Theorems” demonstrated broadly that no learning algorithm can perform well on all possible problems⁴³. ANN researchers in the first decade of the 21st century thus argued that AI should be primarily concerned with the set of tasks that “…most animals can perform effortlessly, such as perception and control, as well as … long-term prediction, reasoning, planning, and [communication]”⁴⁴. This set of tasks has been termed the “AI Set”, and the focus on building computers with capabilities that are similar to those of humans and animals is what distinguishes AI tasks from other tasks in computer science⁴⁴ (note that the word “tasks” here refers broadly to any computation, including those that are unsupervised.)

Much of the success of deep learning can be attributed to the consideration given to learning in the AI Set^15,44. Designing ANNs that are well-suited to learn specific tasks is an example of incorporating “inductive biases” (Box 2): assumptions that one makes about the nature of the solutions to a given optimization problem. Deep learning works so well, in part, because it uses appropriate inductive biases for the AI Set^15,45, particularly hierarchical architectures. For example, images can be well described by composing them into a hierarchical set of increasingly complex features: from edges, to simple combinations of these, to larger configurations that form objects. Language too can be considered a hierarchical construction, with phonemes assembled into words, words into sentences, sentences into narratives. However, deep learning also eschews hand-engineering, allowing the function computed by the system to emerge during learning¹⁵. Thus, despite the common belief that deep learning relies solely on increases in computational power, or that it represents a “blank slate” approach to intelligence, many of the successes of deep learning have grown out of a balance between useful inductive biases and emergent computation, echoing the blend of nature and nurture which underpins the adult brain.

Box 2. What are inductive biases?

Learning is easier when we have prior knowledge about the kind of problems that we will have to solve⁴³. Inductive biases are a means of embedding such prior knowledge into an optimization system. Such inductive biases may be generic, such as hierarchy, or specific, such as convolutions. Importantly, the inductive biases that exist in the brain will have been shaped by evolution to increase an animal’s fitness in both the broad context of life on Earth (e.g. life in a three-dimensional world where one needs to obtain food, water, shelter, etc.), and in specific ecological niches. Examples of inductive biases are:

Simple explanations: When attempting to make sense of the world, simple explanations may be preferred, as articulated by Occam’s Razor⁹⁶. We can build this into ANNs using either Bayesian frameworks or by other mechanisms, such as sparse representations⁵⁹.

Object permanence: The world is organized into objects, which are spatiotemporally constant. We can build this into ANNs by learning representations that assume consistent movement in sensory space⁹⁷.

Visual translation invariance: A visual feature tends to have the same meaning regardless of its location. We can build this into ANNs using convolution operations⁹⁸.

Focused attention: Some aspects of the information coming into a system are more important than others. We can build this into ANNs through attention mechanisms⁹⁹.

Similarly, neuroscientists focus on the behaviors/tasks that a species evolved to perform. This set of tasks overlaps with the AI Set, though possibly not completely, since different species have evolved strong inductive biases for their ecological niches. By considering this “Brain Set” for specific species—the tasks that are important for survival and reproduction for that species—researchers can focus on the features most likely to be key to learning. Just as departing from a pure blank slate was the key to the success of modern ANNs—e.g. by focusing on ANN designs with inductive biases that are useful for the AI Set—so we suspect that it will also be crucial to the development of a deep learning framework for systems neuroscience to focus on how a given animal might solve tasks in its appropriate Brain Set.

Recognizing the importance of inductive biases in deep learning also helps address some existing misconceptions. Deep networks are often considered different from brains because they depend on large amounts of data. However, it is worth noting that (1) many species, especially humans, develop slowly with large quantities of experiential data and (2) that deep networks can work well in low data regimes if they have good inductive biases⁴⁶. For example, deep networks can learn how to learn quickly⁴⁷. In the case of brains, evolution could be one means by which such inductive biases are acquired^48,49.

The three core components of a deep learning framework for the brain

Deep learning combines human design with automatic learning to solve a task. What is designed are not the computations (i.e. the specific input/output functions of the ANNs), but three components: (1) objective functions, (2) learning rules, and (3) architectures (Fig. 1). Objective functions describe the goals of the learning system. They are functions of the synaptic weights of a neural network and the data it receives, but they can be defined without making reference to a specific task or dataset. For example, the cross-entropy objective function, which is common in machine learning, specificies a means of calculating performance on any categorization task, from distinguishing different breeds of dog in the ImageNet dataset to classifying the sentiment behind a tweet. We will return to some of the specific objective functions proposed for the brain below^50–53. Learning rules describe how the parameters in a model are updated. In ANNs, these rules are generally used to improve on the objective function. Notably, this is true not only for supervised learning (where an agent receives an explicit target to mimic), but also for unsupervised learning (where an agent must learn without any instruction) and reinforcement learning systems (where an agent must learn using only rewards/punishments). Finally, architectures describe how the units in an ANN are arranged and what operations they can perform. For example, convolutional networks impose a connectivity pattern whereby the same receptive fields are applied repeatedly over the spatial extent of an input.

Why do so many AI researchers now focus on objective functions, learning rules and architectures instead of designing specific computations? The short answer is that this appears to be the most tractable way to solve real-world problems. Originally, AI practitioners believed that intelligent systems could be hand-designed by piecing together elementary computations⁵⁴. But results on the AI Set were underwhelming¹¹. It now seems clear that solving complex problems with pre-designed computations (e.g. such as handcrafted features) is usually too difficult and practically unworkable. In contrast, specifying objective functions, architectures, and learning rules works well.

There is, though, a drawback: the computations that emerge in large-scale ANNs trained on high-dimensional datasets can be difficult to interpret. We can construct a neural network in a few lines of code, and for each unit in an ANN we can specify the equations that determine their responses to stimuli or relationships to behavior. However, after training, a network is characterized by millions of weights that collectively encode what the network has learned, and it is hard to imagine how we could describe such a system with only a small number of parameters, let alone in words⁵⁵.

Such considerations of complexity are informative for neuroscience. For small circuits comprising only tens of neurons it may be possible to build compact models of individual neural responses and computations (i.e. to develop models that can be communicated using a small number of free parameters or words)^3–5. But, considering that animals are solving many AI Set problems, it is likely that the brain uses solutions that are as complex as the solutions used by ANNs. This suggests that a normative framework that explains why neural responses are as they are, might be best obtained by viewing neural responses as an emergent consequence of the interplay between objective functions, learning rules, and architecture. With such a framework in hand, one could then train ANN models that do, in fact, predict neural responses well^29,67,68. Of course, those ANN models would likely be non-compact, involving millions, billions or even trillions of free parameters, and being nigh indescribable with words. Hence, our claim is not that we could ever hope to predict neural responses with a compact model, but rather, that we could explain the emergence of neural responses within a compact framework.

A question that naturally arises is whether the environment, or data, that an animal encounters should be a fourth essential component for neuroscience. Determining the “Brain Set” for an animal necessarily involves consideration of its evolutionary and ontogenic milieu. Efforts to efficiently describe naturalistic stimuli and identify ethologically-relevant behaviors are crucial to neuroscience, and have shaped many aspects of nervous systems. However, the core issue we are addressing in this perspective piece is how to develop models of complex, hierarchical brain circuits, so we view the environment as a crucial consideration to anchor the core components, but not as one of the components itself.

Once the appropriate Brain Set has been identified, the first question is: what is the architecture of the circuits? This involves descriptions of the cell types and their connectivity (micro, meso and macroscopic). Thus, uncontroversially, we propose that circuit-level descriptions of the brain are a crucial topic for systems neuroscientists. Thanks to modern techniques for circuit tracing and genetic lineage determination, rapid progress is being made^56,57. But, to reiterate, we would argue that understanding the architecture is not sufficient for understanding the circuit; rather, it should be complemented by knowledge of learning rules and objective functions.

Many neuroscientists recognize the importance of learning rules and architecture. But identifying the objective functions that have shaped the brain, either during learning or evolution, is less common. Unlike architectures and learning rules, objective functions may not be directly observable in the brain. Nonetheless, we can define them mathematically and without making reference to a specific environment or task. For example, predictive coding models minimize an objective function known as the description length, which measures how much information is required to encode sensory data using the neural representations. Several other objective functions have been proposed for the brain (Box 3). In this perspective piece, we are not advocating for any of these specific objective functions in the brain, as we are articulating a framework, not a model. One of our key claims is that even though we must infer them, objective functions are an attainable part of a complete theory of how the architectures or learning rules help to achieve a computational goal.

This optimization framework has an added benefit: as with ANNs, the architectures, learning rules and objective functions of the brain are likely relatively simple and compact, at least in comparison to the list of computations performed by individual neurons⁵⁸. The reason is that these three components must presumably be conveyed to offspring through a limited information bottleneck, i.e. the genome (which may not have sufficient capacity to fully specify the wiring of large vertebrate brains⁴⁸). In contrast, the environment in which we live can convey vast amounts of complex and changing information that dwarf the capacity of the genome.

Box 3. Are there objective functions for brains?

Animals clearly have some baseline objective functions. For example, homeostasis minimizes an objective function corresponding to the difference between a physiological variable (like blood oxygen levels) and a set-point for that variable. Given the centrality of homeostasis to physiology, objective functions are arguably something that the brain must be concerned with.

But, some readers may doubt whether the sort of objective functions used in machine learning are relevant to the brain. For example, the cross-entropy objective function used in ANNs trained on categorization tasks is unlikely to be used in the brain, since it requires specification of the correct category for each sensory input. Other objective functions are more ecologically plausible, though. Examples include the description length objective function used in predictive coding models⁵⁰, the log-probability of action sequences scaled by the reward they have produced (which is used in reinforcement learning to maximize rewards)⁵¹, increases in mutual information with the environment¹⁰⁰, and empowerment^52,53, which measures the degree of control an agent has in their environment. These objective functions can all be specified mathematically for the brain without worrying about specific datasets, tasks or environments.

There are, however, real challenges in tying objective functions to empirical and theoretical models in neuroscience. Many potential plasticity rules may not follow the gradient of any objective function at all, or only follow it partially (Fig. 3). This apparently complicates our problems, and makes it impossible to guarantee that objective functions are always involved in neural plasticity. As well, the brain likely optimizes multiple objective functions⁴⁰, some of which we may in fact learn (i.e. we may “learn-to-learn”; for example, humans learn how to learn new board games), and some of which may have been optimized over the course of evolution rather than in an individual animal (i.e. reflexes or reproductive behavior).

Despite these complexities, we believe that consideration of objective functions is critical for systems neuroscience. After all, we know that biological variables, such as dopamine release, meaningfully relate to objective functions from reinforcement learning⁶⁴. In addition, although many potential learning rules may not directly follow the gradient of the objective function, they would still lead to an improvement in that objective function. Here, identifying an objective function allows us to establish whether a change in the phenotype of a neural circuit should be considered a form of learning. If things don’t “get better” according to some metric, how can we refer to any phenotypic plasticity as “learning”as opposed to just “changes”?

Learning should ultimately lead to some form of improvement, which could be measured with an objective function. But, not all synaptic plasticity rules need to follow a gradient. Here we illustrate this idea by showing three different hypothetical learning rules, characterized as vector fields in synaptic weight space. The x and y dimensions correspond to synaptic weights, and the z dimension corresponds to an objective function. Any vector field can be decomposed into a gradient and the directions orthogonal to it. On the left is a plasticity rule that adheres to the gradient of an objective function, directly bringing the system up to the maximum. In the middle is a plasticity rule that is orthogonal to the gradient, and as such, never brings the system closer to the maximum. On the right is a learning rule that only partially follows the gradient, bringing the system towards the maximum, but indirectly. Theoretically, any of these situations may hold in the brain, though learning goals would only be met in the cases where the gradient is fully or partially followed (left and right).

Since the responses of individual neurons are shaped by the environment, their computations should reflect this massive information source. We can see evidence of this in the ubiquity of neurons in the brain that have high entropy in their activity and that do not exhibit easy-to-describe correlations with the multitude of stimuli and behaviors that experimentalists have explored to date^6,7. To clarify our claim, we are suggesting that identifying a normative explanation using the three components may be a fruitful way to go on to develop better, non-compact models of the response properties of neurons in a circuit, as shown by recent studies that use task-optimized deep ANNs to determine the optimal stimuli for activating specific neurons²⁹. As an analogy, the theory of evolution by natural selection provides a compact explanation for why species emerge as they do, one which can be stated in relatively few words. This compact explanation of the emergence of species can then be used to develop more complex, non-compact models of the phylogeny of specific species. Our suggestion is that normative explanations based on the three components could provide similar high-level theories for generating our lower-level models of neural responses, and that this would bring us one step closer to the form of “understanding” that many scientists seek.

It is worth recognizing that researchers have long postulated objective functions and plasticity rules to explain the function of neural circuits^59–62. Many of them, however, have sidestepped the question of hierarchical credit assignment, which is key to deep learning¹⁵. There are clear experimental success stories too, including work on predictive coding^31,63, reinforcement learning^64,65, and hierarchical sensory processing^17,28. Thus, the optimization-based framework that we articulate here can, and has, operated alongside studies of individual neuron response properties. But, we believe that we will see even greater success if a framework focused on the three core components is adopted more widely.

Architectures, learning rules, and objective functions in the wet lab

How can the framework articulated here engage with experimental work? One way to make progress is to build working models using the three core components, then compare the models with the brain. Such models should ideally check out on all levels: (1) They should solve the complex tasks from the Brain Set under consideration. (2) They should be informed by our knowledge of anatomy and plasticity. And, (3) they should reproduce the representations, and changes in representation, we observe in brains (Fig. 4). Of course, checking each of these criteria will be non-trivial. It may require many new experimental paradigms. Checking that a model can solve a given task is relatively straightforward, but representational and anatomical matches are not straightforward to establish, and this is an area of active research^66,67. Luckily, the modularity of the optimization framework allows researchers to attempt to study each of the three components in isolation.

One way to assess the three components at once is to compare experimental data with changes in representations in deep ANNs that incorporate all three components. **(a)** For example, we could use a deep ANN with a hierarchical architecture, trained with an objective function for maximizing rewards that are delivered when it successfully discriminates grating orientations, and a gradient-based, end-to-end learning rule. **(b)** When examining the orientation tuning of the populations in different layers of the hierarchy, such models can make predictions. For instance, the model may predict that the largest changes in tuning should occur higher in the cortical hierarchy (*top*), with smaller changes in the middle, e.g. in V4 (*middle*), and the smallest changes occurring low in the hierarchy, e.g. in V1 (*bottom*). **(c)** This leads to experimentally testable predictions about the average magnitude of changes in neural activity that should be observed experimentally when an animal is learning.

Empirical studies of architecture in the brain

To be able to identify the architecture that defines the inductive biases of the brain, we need to continue performing experiments that explore neuroanatomy at the circuit level. To really frame neuroanatomy within an optimization framework, we must also be able to identify what information is available to a circuit, including where signals about action outcomes may come from. Ultimately, we want to be able to relate these aspects of anatomy to concrete biological markers that guide the developmental processes responsible for learning.

There is considerable experimental effort already underway towards describing the anatomy of the nervous system. We are using a range of imaging techniques to quantify the anatomy and development of circuits^57,68. Extensive work is also conducted in mapping out the projections of neural circuits with cell-type specificity⁵⁶. Research attempting to map out the hierarchy of the brain has long existed⁶⁹, but several groups are now probing which parts of deep ANN hierarchies may best reflect which brain areas^17,70. For example, the representations in striate cortex (as measured, for example, by dissimilarity matrices) better match early layers of a deep ANN, while those in inferotemporal cortex better match later layers^8,71. This strain of work also involves optimization of the architecture of deep ANNs so that they provide a closer fit to representation dynamics in the brain, e.g. by exploring different recurrent connectivity motifs⁶⁶. Confronted with a bewildering set of anatomical observations that have been and will be made, theories and frameworks that place anatomy in a framework alongside objective functions and learning rules offer a way to zero in on those features with the most explanatory power.

Empirical studies of learning rules in the brain

There is a long tradition in neuroscience of studying synaptic plasticity rules. Yet, these studies have rarely explored how credit assignment may occur. However, as we discussed above (Box 1), credit assignment is key to learning in ANNs, and may be in the brain as well. Thankfully, top-down feedback and neuromodulatory systems have become the focus of recent studies of synaptic plasticity^72–76. This has allowed some concrete proposals, e.g. as to how apical dendrites may be involved in credit assignment^12,14, or how top-down attention mechanisms combined with neuromodulators may solve the credit assignment problem^37,38 (Fig. 5). We may also be able to look at changes in representations and infer the plasticity rules from those observations⁷⁷. It is important for experimentalists to measure neural responses both during and after an animal has reached stable performance, so as to capture how representations evolve during learning. Work on learning rules with an eye to credit assignment is producing a finer-grained understanding of the myriad of factors that affect plasticity⁷⁸.

**(a)** Attention based models of credit assignment (refs. ^37,38) propose that the credit assignment problem is solved by the brain using attention and neuromodulatory signals. According to these models, sensory processing is largely feedforward in early stages, then feedback “tags” neurons and synapses for credit, and reward prediction errors (RPE) determine the direction of plastic changes. This is illustrated at the bottom, where circles indicate neurons, and the gray level indicates their level of activity. These models predict that the neurons responsible for activating a particular output unit will be tagged (T) by attentional feedback. Then, if a positive RPE is received, the synapses should potentiate. In contrast, if a negative RPE is received, the synapses should depress. This provides an estimate of a gradient for a category-based objective function. **(b-d)** Dendritic models of credit assignment (refs. ^12,14) propose that gradient signals are carried by “dendritic error” (δ) signals in the apical dendrites of pyramidal neurons. **(b)** According to these models, feedforward weight updates are determined by a combination of feedforward inputs and δ. In an experiment where two different stimuli are presented, and only one is reinforced, this leads to specific predictions. **(c)** If a neuron is tuned towards a stimulus that is reinforced, then reinforcement should lead to an increase in apical activity. **(d)** In contrast, if a neuron is tuned to an unreinforced stimulus, its apical activity should decrease when reinforcement is received.

In the future, we should be better placed to study learning rules with optimization in mind. As optical technologies improve, and potentially give us a means of estimating synaptic changes in vivo ⁷⁹, we may be able to directly relate synaptic changes to things like behavioral errors. We could also directly test hypothesized biological models of learning rules that can solve the credit assignment problem, such as those that use attention^37,38 or those that use dendritic signals for credit assignment^12,14 (Fig. 5).

Empirical studies of objective functions in the brain

In some cases, the objective functions being optimized by the brain may be represented directly in neural signals that we can monitor and record. In other cases, objective functions may only exist implicitly with respect to the plasticity rules that govern synaptic updates. Normative concepts, such as optimal control, are applicable⁸⁰, and evolutionary ideas can inform our thinking. More specifically, ethology may provide guidance⁸¹ as to which functions would be useful for animals to optimize, giving us a meaningful intuitive space in which to think about objective functions.

There is a long-standing literature trying to relate experimental data to objective functions. This starts with theoretical work relating known plasticity rules to potential objective functions. For example, there are studies that attempt to estimate objective functions by comparing neural activity observed experimentally with the neural activity of ANNs trained on natural scenes^59,82. There are also approaches that use inverse reinforcement learning to identify what a system optimizes⁸³. Moreover, one could argue that we can get a handle on objective functions by looking for correlations between representational geometries optimized for a given objective and real neural representational geometries^28,84. Another newly emerging approach asks what an animal’s circuits can optimize when controlling a Brain Computer Interface (BCI) device⁸⁵. Thus, a growing literature, which builds on previous work⁸⁰, helps us explore objective functions in the brain.

Caveats and concerns

One may argue that a focus on architectures, learning rules, and objective functions, and a move away from studying the coding properties of neurons, loses much of what we have learned so far, e.g. orientation selectivity, frequency tuning, spatial-tuning (place cells, grid cells). However, our proposed framework is heavily informed by this knowledge. Convolutional ANNs directly emerged from the observation of complex cells in the visual system⁸⁶. Moreover, tuning curves are often measured in the context of learning experiments, and changes in tuning inform us about learning rules and objective functions.

In a similar vein, a lot of computational neuroscience has emphasized models of the dynamics of neural activity⁸⁷, and that has not been a major theme in our discussion. As such, one might worry that our framework fails to connect with this past literature. However, the framework we articulate here does not preclude consideration of dynamics. A focus on dynamics may equally be repurposed for making inferences about architectures, learning rules and objective functions, which have long been a feature of models of neural dynamics^49,88.

Another common objection to the relevance of deep learning for neuroscience is that many behaviors that animals engage in appear to require relatively little learning⁴⁸. However, such innate behavior was “learned”, only on evolutionary timescales. Hardwired behavior is, arguably, best described as strong inductive biases, since even pre-wired behaviors can be modified by learning (e.g. horses still get better at running after birth). Hence, even when a neural circuit engages in only moderate amounts of learning, an optimization framework can help us model its operations⁴⁸.

The framework that we have laid out here makes the optimization of objective functions central to models of the brain. But a comprehensive theory of any brain likely requires attention to other constraints unrelated to any form of objective function optimization. For example, many aspects of physiology are determined by phylogenetic constraints that may be hold-overs from evolutionary ancestors. While these constraints are undoubtedly crucial for our models in neuroscience, we believe that it is the optimization of objective functions within these constraints that produces the rich diversity of neural circuitry and behavior that we observe in the brain.

Some of us, who are inclined to a bottom-up approach to understanding the brain, worry that attempts to posit objective functions or learning rules for the brain may be premature, needing far more details of brain operation than we currently possess. Nonetheless, scientific questions necessarily are posed within some framework of thought. Importantly, we are not calling for abandoning bottom-up explanations. Instead, we hope that important new experimental questions will emerge from the framework suggested by ANNs (see e.g. Fig. 5).

Finally, some researchers are concerned by the large number of parameters in deep ANNs, seeing them as a violation of Occam’s razor and merely an overfitting to data. Interestingly, recent work in AI shows that the behavior of massively overparameterized learning systems can be counterintuitive—there appear to be intrinsic mathematical properties of over-parameterized learning systems that enable good generalization^42,89. Since the brain itself apparently contains a massive number of potential parameters to adapt (e.g. synaptic connections, dendritic ion channel densities, etc.), one might argue that the large number of parameters in deep ANNs actually makes them even more appropriate models of the brain.

Conclusion

Much of systems neuroscience has attempted to formulate succinct statements about the function of individual neurons in the brain. This approach has been successful at explaining some (relatively small) circuits and certain hard-wired behaviors. However, there is reason to believe that this approach will need to be complemented by other insights if we are to develop good models of plastic circuits with thousands, millions or billions of neurons. There is, unfortunately, no guarantee that the function of individual neurons in the central nervous system can be compressed down to a human-interpretable, verbally articulable form. Given that we currently have no good means of distilling the function of individual units in deep ANNs into words, and given that real brains are likely more, not less, complex, we suggest that systems neuroscience would benefit from focusing on the kinds of models that have been successful in ANN research programs, i.e. models grounded in the three essential components.

Current theories in systems neuroscience are beautiful and insightful, but we believe that they could benefit from a cohesive framework founded in optimization. For example, local plasticity rules, such as Hebbian mechanisms, explain a great deal of biological data. But, to achieve good performance on complex tasks, Hebbian rules must be designed with objective functions and architectures in mind^34,90. Similarly, other researchers have, for good reason, pointed out the benefits of the inductive biases utilized by the brain⁴⁸. However, inductive biases are not on their own sufficient to solve complex tasks, like those contained in the AI Set or various Brain Sets. To solve these difficult problems, inductive biases must be paired with learning and credit assignment. If, as we have argued, the set of tasks that an animal can solve are an essential consideration for neuroscience, then it is critical to build models that can actually solve these tasks.

Inevitably, both bottom-up descriptive work and top-down theoretical work will be required to make progress in systems neuroscience. It is important, though, to start with the right kind of top-down theoretical framing. Given the ability of modern machine learning to solve problems in the AI Set and numerous Brain Sets, it will be fruitful to guide the top-down framework of systems neuroscience research with machine learning insights. If we consider research data within the framework provided by this mindset, and focus our attention on the three essential components identified here, we believe we can develop theories of the brain that will reap the full benefits of the current technological revolution in neuroscience.

Acknowledgements

This article emerged from a workshop on optimization in the brain that happened February 24-28, 2019 at the Bellairs Research Institute of McGill University. We would like to thank Element AI and Bellairs Research Institute for their critical support in organizing this workshop.

References

1.Mathis A, et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci. 2018;21:1281–1289. doi: 10.1038/s41593-018-0209-y. [DOI] [PubMed] [Google Scholar]
2.Steinmetz NA, Koch C, Harris KD, Carandini M. Challenges and opportunities for large-scale electrophysiology with Neuropixels probes. Neurotechnologies. 2018;50:92–100. doi: 10.1016/j.conb.2018.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Marder E, Bucher D. Central pattern generators and the control of rhythmic movements. Curr Biol. 2001;11:R986–R996. doi: 10.1016/s0960-9822(01)00581-4. [DOI] [PubMed] [Google Scholar]
4.Cullen KE. The vestibular system: multimodal integration and encoding of self-motion for motor control. Trends Neurosci. 2012;35:185–196. doi: 10.1016/j.tins.2011.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kim JS, et al. Space–time wiring specificity supports direction selectivity in the retina. Nature. 2014;509:331. doi: 10.1038/nature13240. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Olshausen BA, Field DJ. What is the other 85 percent of V1 doing. Van Hemmen T Sejnowski Eds. 2006;23:182–211. [Google Scholar]
7.Thompson L, Best P. Place cells and silent cells in the hippocampus of freely-behaving rats. J Neurosci. 1989;9:2382–2390. doi: 10.1523/JNEUROSCI.09-07-02382.1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Yamins DLK, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nat Neurosci. 2016;19:356–365. doi: 10.1038/nn.4244. [DOI] [PubMed] [Google Scholar]
9.Botvinick M, et al. Reinforcement Learning, Fast and Slow. Trends Cogn Sci. 2019 doi: 10.1016/j.tics.2019.02.006. [DOI] [PubMed] [Google Scholar]
10.Kriegeskorte N, Douglas PK. Cognitive computational neuroscience. Nat Neurosci. 2018;1 doi: 10.1038/s41593-018-0210-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Rumelhart DE, McClelland JL, PDP Research Group . Parallel distributed processing. Vol. 1 MIT press Cambridge; 1988. [Google Scholar]
12.Sacramento J, Costa RP, Bengio Y, Senn W. Dendritic cortical microcircuits approximate the backpropagation algorithm. 2018:8735–8746. [Google Scholar]
13.Poirazi P, Brannon T, Mel BW. Pyramidal Neuron as Two-Layer Neural Network. Neuron. 2003;37:989–999. doi: 10.1016/s0896-6273(03)00149-1. [DOI] [PubMed] [Google Scholar]
14.Guerguiev J, Lillicrap TP, Richards BA. Towards deep learning with segregated dendrites. eLife. 2017;6:e22901. doi: 10.7554/eLife.22901. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press; 2016. [Google Scholar]
16.Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci Rep. 2016;6:27755. doi: 10.1038/srep27755. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kell AJE, Yamins DLK, Shook EN, Norman-Haignere SV, McDermott JH. A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy. Neuron. 2018;98:630–644.e16. doi: 10.1016/j.neuron.2018.03.044. [DOI] [PubMed] [Google Scholar]
18.Richards BA, Lillicrap TP. Dendritic solutions to the credit assignment problem. Curr Opin Neurobiol. 2019;54:28–36. doi: 10.1016/j.conb.2018.08.003. [DOI] [PubMed] [Google Scholar]
19.Roelfsema PR, Holtmaat A. Control of synaptic plasticity in deep cortical networks. Nat Rev Neurosci. 2018;19:166. doi: 10.1038/nrn.2018.6. [DOI] [PubMed] [Google Scholar]
20.Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. 2012:1097–1105. [Google Scholar]
21.Hannun A, et al. Deep speech: Scaling up end-to-end speech recognition. ArXiv Prepr ArXiv14125567. 2014 [Google Scholar]
22.Radford A, et al. Language models are unsupervised multitask learners. OpenAI Blog. 2019;1:8. [Google Scholar]
23.Gao Y, Hendricks LA, Kuchenbecker KJ, Darrell T. Deep learning for tactile understanding from visual and haptic data. IEEE; 2016. pp. 536–543. [Google Scholar]
24.Banino A, et al. Vector-based navigation using grid-like representations in artificial agents. Nature. 2018;557:429–433. doi: 10.1038/s41586-018-0102-6. [DOI] [PubMed] [Google Scholar]
25.Finn C, Goodfellow I, Levine S. Unsupervised learning for physical interaction through video prediction. 2016:64–72. [Google Scholar]
26.Silver D, et al. Mastering the game of go without human knowledge. Nature. 2017;550:354. doi: 10.1038/nature24270. [DOI] [PubMed] [Google Scholar]
27.Santoro A, et al. A simple neural network module for relational reasoning. 2017:4967–4976. [Google Scholar]
28.Khaligh-Razavi S-M, Kriegeskorte N. Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation. PLoS Comput Biol. 2014;10:e1003915. doi: 10.1371/journal.pcbi.1003915. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Bashivan P, Kar K, DiCarlo JJ. Neural population control via deep image synthesis. Science. 2019;364:eaav9436. doi: 10.1126/science.aav9436. [DOI] [PubMed] [Google Scholar]
30.Pospisil DA, Pasupathy A, Bair W. ‘Artiphysiology’ reveals V4-like shape tuning in a deep network trained for image classification. eLife. 2018;7:e38242. doi: 10.7554/eLife.38242. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Singer Y, et al. Sensory cortex is optimized for prediction of future input. eLife. 2018;7:e31557. doi: 10.7554/eLife.31557. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Watanabe E, Kitaoka A, Sakamoto K, Yasugi M, Tanaka K. Illusory Motion Reproduced by Deep Neural Networks Trained for Prediction. Front Psychol. 2018;9:345. doi: 10.3389/fpsyg.2018.00345. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Wang JX, et al. Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci. 2018;21:860–868. doi: 10.1038/s41593-018-0147-8. [DOI] [PubMed] [Google Scholar]
34.Scellier B, Bengio Y. Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation. Front Comput Neurosci. 2017;11:24. doi: 10.3389/fncom.2017.00024. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Whittington JC, Bogacz R. An approximation of the error backpropagation algorithm in a predictive coding network with local hebbian synaptic plasticity. Neural Comput. 2017 doi: 10.1162/NECO_a_00949. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Lillicrap TP, Cownden D, Tweed DB, Akerman CJ. Random synaptic feedback weights support error backpropagation for deep learning. Nat Commun. 2016;7:13276. doi: 10.1038/ncomms13276. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Roelfsema PR, van Ooyen A. Attention-Gated Reinforcement Learning of Internal Representations for Classification. Neural Comput. 2005;17:2176–2214. doi: 10.1162/0899766054615699. [DOI] [PubMed] [Google Scholar]
38.Pozzi I, Bohté S, Roelfsema P. A Biologically Plausible Learning Rule for Deep Learning in the Brain. ArXiv Prepr ArXiv181101768. 2018 [Google Scholar]
39.Körding KP, König P. Supervised and Unsupervised Learning with Two Sites of Synaptic Integration. J Comput Neurosci. 2001;11:207–215. doi: 10.1023/a:1013776130161. [DOI] [PubMed] [Google Scholar]
40.Marblestone AH, Wayne G, Kording KP. Toward an Integration of Deep Learning and Neuroscience. Front Comput Neurosci. 2016;10 doi: 10.3389/fncom.2016.00094. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Raman DV, Rotondo AP, O’Leary T. Fundamental bounds on learning performance in neural circuits. Proc Natl Acad Sci. 2019 doi: 10.1073/pnas.1813416116. 201813416. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Neyshabur B, Li Z, Bhojanapalli S, LeCun Y, Srebro N. The role of over-parametrization in generalization of neural networks. 2018 [Google Scholar]
43.Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Comput. 1997;1:67–82. [Google Scholar]
44.Bengio Y, LeCun Y. Scaling learning algorithms towards AI. Large-Scale Kernel Mach. 2007;34:1–41. [Google Scholar]
45.Neyshabur B, Tomioka R, Srebro N. In search of the real inductive bias: On the role of implicit regularization in deep learning. ArXiv Prepr ArXiv14126614. 2014 [Google Scholar]
46.Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. 2017:4077–4087. [Google Scholar]
47.Ravi S, Larochelle H. Optimization as a model for few-shot learning. 2016 [Google Scholar]
48.Zador AM. A Critique of Pure Learning: What Artificial Neural Networks can Learn from Animal Brains. Nat Commun. 2019;10:1–7. doi: 10.1038/s41467-019-11786-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Bellec G, Salaj D, Subramoney A, Legenstein R, Maass W. Long short-term memory and learning-to-learn in networks of spiking neurons. 2018:787–797. [Google Scholar]
50.Huang Y, Rao RPN. Predictive coding. Wiley Interdiscip Rev Cogn Sci. 2011;2:580–593. doi: 10.1002/wcs.142. [DOI] [PubMed] [Google Scholar]
51.Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn. 1992;8:229–256. [Google Scholar]
52.Klyubin AS, Polani D, Nehaniv CL. Vol. 1. IEEE; 2005. Empowerment: A universal agent-centric measure of control; pp. 128–135. [Google Scholar]
53.Salge C, Glackin C, Polani D. Guided Self-Organization: Inception. Springer; 2014. Empowerment–an introduction; pp. 67–114. [Google Scholar]
54.Newell A, Simon HA. GPS, a program that simulates human thought. (RAND CORP SANTA MONICA CALIF) 1961 [Google Scholar]
55.Nguyen A, Yosinski J, Clune J. Understanding Neural Networks via Feature Visualization: A survey. ArXiv Prepr ArXiv190408939. 2019 [Google Scholar]
56.Kebschull JM, et al. High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA. Neuron. 2016;91:975–987. doi: 10.1016/j.neuron.2016.07.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Kornfeld J, Denk W. Progress and remaining challenges in high-throughput volume electron microscopy. Curr Opin Neurobiol. 2018;50:261–267. doi: 10.1016/j.conb.2018.04.030. [DOI] [PubMed] [Google Scholar]
58.Lillicrap TP, Kording KP. What does it mean to understand a neural network? ArXiv Prepr ArXiv190706374. 2019 [Google Scholar]
59.Olshausen BA, Field DJ. Natural image statistics and efficient coding. Netw.Comput Neural Syst. 1996;7:333–339. doi: 10.1088/0954-898X/7/2/014. [DOI] [PubMed] [Google Scholar]
60.Hyvärinen A, Oja E. Simple neuron models for independent component analysis. Int J Neural Syst. 1996;7:671–687. doi: 10.1142/s0129065796000646. [DOI] [PubMed] [Google Scholar]
61.Oja E. Simplified neuron model as a principal component analyzer. J Math Biol. 1982;15:267–273. doi: 10.1007/BF00275687. [DOI] [PubMed] [Google Scholar]
62.Intrator N, Cooper LN. Objective function formulation of the BCM theory of visual cortical plasticity: Statistical connections, stability conditions. Neural Netw. 1992;5:3–17. [Google Scholar]
63.Fiser A, et al. Experience-dependent spatial expectations in mouse visual cortex. Nat Neurosci advance online publication. 2016 doi: 10.1038/nn.4385. [DOI] [PubMed] [Google Scholar]
64.Schultz W, Dayan P, Montague PR. A Neural Substrate of Prediction and Reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
65.Momennejad I, et al. The successor representation in human reinforcement learning. bioRxiv. 2016 doi: 10.1101/083824. [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Nayebi A, et al. Task-Driven convolutional recurrent models of the visual system. 2018:5290–5301. [Google Scholar]
67.Schrimpf M, et al. Brain-Score: which artificial neural network for object recognition is most brain-like? BioRxiv 407007. 2018 [Google Scholar]
68.Kepecs A, Fishell G. Interneuron cell types are fit to function. Nature. 2014;505:318–326. doi: 10.1038/nature12983. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Van Essen DC, Anderson CH. Information processing strategies and pathways in the primate visual system. Introd Neural Electron Netw. 1995;2:45–76. [Google Scholar]
70.Lindsey J, Ocko SA, Ganguli S, Deny S. A unified theory of early visual representations from retina to cortex through anatomically constrained deep CNNs. ArXiv Prepr ArXiv190100945. 2019 [Google Scholar]
71.Güçlü U, van Gerven MA. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J Neurosci. 2015;35:10005–10014. doi: 10.1523/JNEUROSCI.5023-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Kwag J, Paulsen O. The timing of external input controls the sign of plasticity at local synapses. Nat Neurosci. 2009;12:1219. doi: 10.1038/nn.2388. [DOI] [PubMed] [Google Scholar]
73.Bittner KC, Milstein AD, Grienberger C, Romani S, Magee JC. Behavioral time scale synaptic plasticity underlies CA1 place fields. Science. 2017;357:1033. doi: 10.1126/science.aan3846. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Lacefield CO, Pnevmatikakis EA, Paninski L, Bruno RM. Reinforcement Learning Recruits Somata and Apical Dendrites across Layers of Primary Sensory Cortex. Cell Rep. 2019;26:2000–2008. doi: 10.1016/j.celrep.2019.01.093. [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Williams LE, Holtmaat A. Higher-Order Thalamocortical Inputs Gate Synaptic Long-Term Potentiation via Disinhibition. Neuron. 2019 doi: 10.1016/j.neuron.2018.10.049. [DOI] [PubMed] [Google Scholar]
76.Yagishita S, et al. A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science. 2014;345:1616. doi: 10.1126/science.1255514. [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Lim S, et al. Inferring learning rules from distributions of firing rates in cortical neurons. Nat Neurosci. 2015;18:1804–1810. doi: 10.1038/nn.4158. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Costa RP, et al. Synaptic transmission optimization predicts expression loci of long-term plasticity. Neuron. 2017;96:177–189. doi: 10.1016/j.neuron.2017.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
79.Zolnik TA, et al. All-optical functional synaptic connectivity mapping in acute brain slices using the calcium integrator CaMPARI. J Physiol. 2017;595:1465–1477. doi: 10.1113/JP273116. [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Scott SH. Optimal feedback control and the neural basis of volitional motor control. Nat Rev Neurosci. 2004;5:532. doi: 10.1038/nrn1427. [DOI] [PubMed] [Google Scholar]
81.Krakauer JW, Ghazanfar AA, Gomez-Marin A, MacIver MA, Poeppel D. Neuroscience needs behavior: correcting a reductionist bias. Neuron. 2017;93:480–490. doi: 10.1016/j.neuron.2016.12.041. [DOI] [PubMed] [Google Scholar]
82.Zylberberg J, Murphy JT, DeWeese MR. A Sparse Coding Model with Synaptically Local Plasticity and Spiking Neurons Can Account for the Diverse Shapes of V1 Simple Cell Receptive Fields. PLOS Comput Biol. 2011;7:e1002250. doi: 10.1371/journal.pcbi.1002250. [DOI] [PMC free article] [PubMed] [Google Scholar]
83.Chalk M, Tkačik G, Marre O. Inferring the function performed by a recurrent neural network. bioRxiv 598086. 2019 doi: 10.1101/598086. [DOI] [PMC free article] [PubMed] [Google Scholar]
84.Cadieu CF, et al. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition. PLoS Comput Biol. 2014;10:e1003963. doi: 10.1371/journal.pcbi.1003963. [DOI] [PMC free article] [PubMed] [Google Scholar]
85.Golub MD, et al. Learning by neural reassociation. Nat Neurosci. 2018;21:607–616. doi: 10.1038/s41593-018-0095-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
86.Fukushima K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980;36:193–202. doi: 10.1007/BF00344251. [DOI] [PubMed] [Google Scholar]
87.Vogels TP, Rajan K, Abbott LF. Neural Network Dynamics. Annu Rev Neurosci. 2005;28:357–376. doi: 10.1146/annurev.neuro.28.061604.135637. [DOI] [PubMed] [Google Scholar]
88.Koren V, Denève S. Computational Account of Spontaneous Activity as a Signature of Predictive Coding. PLOS Comput Biol. 2017;13:e1005355. doi: 10.1371/journal.pcbi.1005355. [DOI] [PMC free article] [PubMed] [Google Scholar]
89.Advani MS, Saxe AM. High-dimensional dynamics of generalization error in neural networks. ArXiv Prepr ArXiv171003667. 2017 doi: 10.1016/j.neunet.2020.08.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
90.Amit Y. Deep learning with asymmetric connections and Hebbian updates. Front Comput Neurosci. 2019;13 doi: 10.3389/fncom.2019.00018. [DOI] [PMC free article] [PubMed] [Google Scholar]
91.Samadi A, Lillicrap TP, Tweed DB. Deep learning with dynamic spiking neurons and fixed feedback weights. Neural Comput. 2017;29:578–602. doi: 10.1162/NECO_a_00929. [DOI] [PubMed] [Google Scholar]
92.Akrout M, Wilson C, Humphreys PC, Lillicrap T, Tweed D. Using Weight Mirrors to Improve Feedback Alignment. ArXiv Prepr ArXiv190405391. 2019 [Google Scholar]
93.Lansdell B, Kording K. Spiking allows neurons to estimate their causal effect. bioRxiv 253351. 2018 [Google Scholar]
94.Werfel J, Xie X, Seung HS. Learning curves for stochastic gradient descent in linear feedforward networks. 2004:1197–1204. doi: 10.1162/089976605774320539. [DOI] [PubMed] [Google Scholar]
95.Bartunov S, et al. Assessing the scalability of biologically-motivated deep learning algorithms and architectures. 2018:9368–9378. [Google Scholar]
96.MacKay DJ. Cambridge university press; 2003. Information theory, inference and learning algorithms. [Google Scholar]
97.Goel V, Weng J, Poupart P. Unsupervised video object segmentation for deep reinforcement learning. 2018;5683:5694. [Google Scholar]
98.LeCun Y, Bengio Y. Convolutional networks for images, speech, and time series. Handb Brain Theory Neural Netw. 1995;3361:1995. [Google Scholar]
99.Chorowski JK, Bahdanau D, Serdyuk KD, Cho K, Bengio Y. Attention-based models for speech recognition; NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems; 2015. pp. 577–585. [Google Scholar]
100.Houthooft R, et al. Vime: Variational information maximizing exploration. 2016;1109:1117. [Google Scholar]

[R1] 1.Mathis A, et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat Neurosci. 2018;21:1281–1289. doi: 10.1038/s41593-018-0209-y. [DOI] [PubMed] [Google Scholar]

[R2] 2.Steinmetz NA, Koch C, Harris KD, Carandini M. Challenges and opportunities for large-scale electrophysiology with Neuropixels probes. Neurotechnologies. 2018;50:92–100. doi: 10.1016/j.conb.2018.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Marder E, Bucher D. Central pattern generators and the control of rhythmic movements. Curr Biol. 2001;11:R986–R996. doi: 10.1016/s0960-9822(01)00581-4. [DOI] [PubMed] [Google Scholar]

[R4] 4.Cullen KE. The vestibular system: multimodal integration and encoding of self-motion for motor control. Trends Neurosci. 2012;35:185–196. doi: 10.1016/j.tins.2011.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Kim JS, et al. Space–time wiring specificity supports direction selectivity in the retina. Nature. 2014;509:331. doi: 10.1038/nature13240. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Olshausen BA, Field DJ. What is the other 85 percent of V1 doing. Van Hemmen T Sejnowski Eds. 2006;23:182–211. [Google Scholar]

[R7] 7.Thompson L, Best P. Place cells and silent cells in the hippocampus of freely-behaving rats. J Neurosci. 1989;9:2382–2390. doi: 10.1523/JNEUROSCI.09-07-02382.1989. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Yamins DLK, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nat Neurosci. 2016;19:356–365. doi: 10.1038/nn.4244. [DOI] [PubMed] [Google Scholar]

[R9] 9.Botvinick M, et al. Reinforcement Learning, Fast and Slow. Trends Cogn Sci. 2019 doi: 10.1016/j.tics.2019.02.006. [DOI] [PubMed] [Google Scholar]

[R10] 10.Kriegeskorte N, Douglas PK. Cognitive computational neuroscience. Nat Neurosci. 2018;1 doi: 10.1038/s41593-018-0210-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Rumelhart DE, McClelland JL, PDP Research Group . Parallel distributed processing. Vol. 1 MIT press Cambridge; 1988. [Google Scholar]

[R12] 12.Sacramento J, Costa RP, Bengio Y, Senn W. Dendritic cortical microcircuits approximate the backpropagation algorithm. 2018:8735–8746. [Google Scholar]

[R13] 13.Poirazi P, Brannon T, Mel BW. Pyramidal Neuron as Two-Layer Neural Network. Neuron. 2003;37:989–999. doi: 10.1016/s0896-6273(03)00149-1. [DOI] [PubMed] [Google Scholar]

[R14] 14.Guerguiev J, Lillicrap TP, Richards BA. Towards deep learning with segregated dendrites. eLife. 2017;6:e22901. doi: 10.7554/eLife.22901. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press; 2016. [Google Scholar]

[R16] 16.Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci Rep. 2016;6:27755. doi: 10.1038/srep27755. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Kell AJE, Yamins DLK, Shook EN, Norman-Haignere SV, McDermott JH. A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy. Neuron. 2018;98:630–644.e16. doi: 10.1016/j.neuron.2018.03.044. [DOI] [PubMed] [Google Scholar]

[R18] 18.Richards BA, Lillicrap TP. Dendritic solutions to the credit assignment problem. Curr Opin Neurobiol. 2019;54:28–36. doi: 10.1016/j.conb.2018.08.003. [DOI] [PubMed] [Google Scholar]

[R19] 19.Roelfsema PR, Holtmaat A. Control of synaptic plasticity in deep cortical networks. Nat Rev Neurosci. 2018;19:166. doi: 10.1038/nrn.2018.6. [DOI] [PubMed] [Google Scholar]

[R20] 20.Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. 2012:1097–1105. [Google Scholar]

[R21] 21.Hannun A, et al. Deep speech: Scaling up end-to-end speech recognition. ArXiv Prepr ArXiv14125567. 2014 [Google Scholar]

[R22] 22.Radford A, et al. Language models are unsupervised multitask learners. OpenAI Blog. 2019;1:8. [Google Scholar]

[R23] 23.Gao Y, Hendricks LA, Kuchenbecker KJ, Darrell T. Deep learning for tactile understanding from visual and haptic data. IEEE; 2016. pp. 536–543. [Google Scholar]

[R24] 24.Banino A, et al. Vector-based navigation using grid-like representations in artificial agents. Nature. 2018;557:429–433. doi: 10.1038/s41586-018-0102-6. [DOI] [PubMed] [Google Scholar]

[R25] 25.Finn C, Goodfellow I, Levine S. Unsupervised learning for physical interaction through video prediction. 2016:64–72. [Google Scholar]

[R26] 26.Silver D, et al. Mastering the game of go without human knowledge. Nature. 2017;550:354. doi: 10.1038/nature24270. [DOI] [PubMed] [Google Scholar]

[R27] 27.Santoro A, et al. A simple neural network module for relational reasoning. 2017:4967–4976. [Google Scholar]

[R28] 28.Khaligh-Razavi S-M, Kriegeskorte N. Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation. PLoS Comput Biol. 2014;10:e1003915. doi: 10.1371/journal.pcbi.1003915. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Bashivan P, Kar K, DiCarlo JJ. Neural population control via deep image synthesis. Science. 2019;364:eaav9436. doi: 10.1126/science.aav9436. [DOI] [PubMed] [Google Scholar]

[R30] 30.Pospisil DA, Pasupathy A, Bair W. ‘Artiphysiology’ reveals V4-like shape tuning in a deep network trained for image classification. eLife. 2018;7:e38242. doi: 10.7554/eLife.38242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Singer Y, et al. Sensory cortex is optimized for prediction of future input. eLife. 2018;7:e31557. doi: 10.7554/eLife.31557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Watanabe E, Kitaoka A, Sakamoto K, Yasugi M, Tanaka K. Illusory Motion Reproduced by Deep Neural Networks Trained for Prediction. Front Psychol. 2018;9:345. doi: 10.3389/fpsyg.2018.00345. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Wang JX, et al. Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci. 2018;21:860–868. doi: 10.1038/s41593-018-0147-8. [DOI] [PubMed] [Google Scholar]

[R34] 34.Scellier B, Bengio Y. Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation. Front Comput Neurosci. 2017;11:24. doi: 10.3389/fncom.2017.00024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Whittington JC, Bogacz R. An approximation of the error backpropagation algorithm in a predictive coding network with local hebbian synaptic plasticity. Neural Comput. 2017 doi: 10.1162/NECO_a_00949. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Lillicrap TP, Cownden D, Tweed DB, Akerman CJ. Random synaptic feedback weights support error backpropagation for deep learning. Nat Commun. 2016;7:13276. doi: 10.1038/ncomms13276. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Roelfsema PR, van Ooyen A. Attention-Gated Reinforcement Learning of Internal Representations for Classification. Neural Comput. 2005;17:2176–2214. doi: 10.1162/0899766054615699. [DOI] [PubMed] [Google Scholar]

[R38] 38.Pozzi I, Bohté S, Roelfsema P. A Biologically Plausible Learning Rule for Deep Learning in the Brain. ArXiv Prepr ArXiv181101768. 2018 [Google Scholar]

[R39] 39.Körding KP, König P. Supervised and Unsupervised Learning with Two Sites of Synaptic Integration. J Comput Neurosci. 2001;11:207–215. doi: 10.1023/a:1013776130161. [DOI] [PubMed] [Google Scholar]

[R40] 40.Marblestone AH, Wayne G, Kording KP. Toward an Integration of Deep Learning and Neuroscience. Front Comput Neurosci. 2016;10 doi: 10.3389/fncom.2016.00094. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Raman DV, Rotondo AP, O’Leary T. Fundamental bounds on learning performance in neural circuits. Proc Natl Acad Sci. 2019 doi: 10.1073/pnas.1813416116. 201813416. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Neyshabur B, Li Z, Bhojanapalli S, LeCun Y, Srebro N. The role of over-parametrization in generalization of neural networks. 2018 [Google Scholar]

[R43] 43.Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Comput. 1997;1:67–82. [Google Scholar]

[R44] 44.Bengio Y, LeCun Y. Scaling learning algorithms towards AI. Large-Scale Kernel Mach. 2007;34:1–41. [Google Scholar]

[R45] 45.Neyshabur B, Tomioka R, Srebro N. In search of the real inductive bias: On the role of implicit regularization in deep learning. ArXiv Prepr ArXiv14126614. 2014 [Google Scholar]

[R46] 46.Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. 2017:4077–4087. [Google Scholar]

[R47] 47.Ravi S, Larochelle H. Optimization as a model for few-shot learning. 2016 [Google Scholar]

[R48] 48.Zador AM. A Critique of Pure Learning: What Artificial Neural Networks can Learn from Animal Brains. Nat Commun. 2019;10:1–7. doi: 10.1038/s41467-019-11786-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.Bellec G, Salaj D, Subramoney A, Legenstein R, Maass W. Long short-term memory and learning-to-learn in networks of spiking neurons. 2018:787–797. [Google Scholar]

[R50] 50.Huang Y, Rao RPN. Predictive coding. Wiley Interdiscip Rev Cogn Sci. 2011;2:580–593. doi: 10.1002/wcs.142. [DOI] [PubMed] [Google Scholar]

[R51] 51.Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn. 1992;8:229–256. [Google Scholar]

[R52] 52.Klyubin AS, Polani D, Nehaniv CL. Vol. 1. IEEE; 2005. Empowerment: A universal agent-centric measure of control; pp. 128–135. [Google Scholar]

[R53] 53.Salge C, Glackin C, Polani D. Guided Self-Organization: Inception. Springer; 2014. Empowerment–an introduction; pp. 67–114. [Google Scholar]

[R54] 54.Newell A, Simon HA. GPS, a program that simulates human thought. (RAND CORP SANTA MONICA CALIF) 1961 [Google Scholar]

[R55] 55.Nguyen A, Yosinski J, Clune J. Understanding Neural Networks via Feature Visualization: A survey. ArXiv Prepr ArXiv190408939. 2019 [Google Scholar]

[R56] 56.Kebschull JM, et al. High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA. Neuron. 2016;91:975–987. doi: 10.1016/j.neuron.2016.07.036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] 57.Kornfeld J, Denk W. Progress and remaining challenges in high-throughput volume electron microscopy. Curr Opin Neurobiol. 2018;50:261–267. doi: 10.1016/j.conb.2018.04.030. [DOI] [PubMed] [Google Scholar]

[R58] 58.Lillicrap TP, Kording KP. What does it mean to understand a neural network? ArXiv Prepr ArXiv190706374. 2019 [Google Scholar]

[R59] 59.Olshausen BA, Field DJ. Natural image statistics and efficient coding. Netw.Comput Neural Syst. 1996;7:333–339. doi: 10.1088/0954-898X/7/2/014. [DOI] [PubMed] [Google Scholar]

[R60] 60.Hyvärinen A, Oja E. Simple neuron models for independent component analysis. Int J Neural Syst. 1996;7:671–687. doi: 10.1142/s0129065796000646. [DOI] [PubMed] [Google Scholar]

[R61] 61.Oja E. Simplified neuron model as a principal component analyzer. J Math Biol. 1982;15:267–273. doi: 10.1007/BF00275687. [DOI] [PubMed] [Google Scholar]

[R62] 62.Intrator N, Cooper LN. Objective function formulation of the BCM theory of visual cortical plasticity: Statistical connections, stability conditions. Neural Netw. 1992;5:3–17. [Google Scholar]

[R63] 63.Fiser A, et al. Experience-dependent spatial expectations in mouse visual cortex. Nat Neurosci advance online publication. 2016 doi: 10.1038/nn.4385. [DOI] [PubMed] [Google Scholar]

[R64] 64.Schultz W, Dayan P, Montague PR. A Neural Substrate of Prediction and Reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]

[R65] 65.Momennejad I, et al. The successor representation in human reinforcement learning. bioRxiv. 2016 doi: 10.1101/083824. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R66] 66.Nayebi A, et al. Task-Driven convolutional recurrent models of the visual system. 2018:5290–5301. [Google Scholar]

[R67] 67.Schrimpf M, et al. Brain-Score: which artificial neural network for object recognition is most brain-like? BioRxiv 407007. 2018 [Google Scholar]

[R68] 68.Kepecs A, Fishell G. Interneuron cell types are fit to function. Nature. 2014;505:318–326. doi: 10.1038/nature12983. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R69] 69.Van Essen DC, Anderson CH. Information processing strategies and pathways in the primate visual system. Introd Neural Electron Netw. 1995;2:45–76. [Google Scholar]

[R70] 70.Lindsey J, Ocko SA, Ganguli S, Deny S. A unified theory of early visual representations from retina to cortex through anatomically constrained deep CNNs. ArXiv Prepr ArXiv190100945. 2019 [Google Scholar]

[R71] 71.Güçlü U, van Gerven MA. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J Neurosci. 2015;35:10005–10014. doi: 10.1523/JNEUROSCI.5023-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R72] 72.Kwag J, Paulsen O. The timing of external input controls the sign of plasticity at local synapses. Nat Neurosci. 2009;12:1219. doi: 10.1038/nn.2388. [DOI] [PubMed] [Google Scholar]

[R73] 73.Bittner KC, Milstein AD, Grienberger C, Romani S, Magee JC. Behavioral time scale synaptic plasticity underlies CA1 place fields. Science. 2017;357:1033. doi: 10.1126/science.aan3846. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R74] 74.Lacefield CO, Pnevmatikakis EA, Paninski L, Bruno RM. Reinforcement Learning Recruits Somata and Apical Dendrites across Layers of Primary Sensory Cortex. Cell Rep. 2019;26:2000–2008. doi: 10.1016/j.celrep.2019.01.093. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R75] 75.Williams LE, Holtmaat A. Higher-Order Thalamocortical Inputs Gate Synaptic Long-Term Potentiation via Disinhibition. Neuron. 2019 doi: 10.1016/j.neuron.2018.10.049. [DOI] [PubMed] [Google Scholar]

[R76] 76.Yagishita S, et al. A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science. 2014;345:1616. doi: 10.1126/science.1255514. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R77] 77.Lim S, et al. Inferring learning rules from distributions of firing rates in cortical neurons. Nat Neurosci. 2015;18:1804–1810. doi: 10.1038/nn.4158. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R78] 78.Costa RP, et al. Synaptic transmission optimization predicts expression loci of long-term plasticity. Neuron. 2017;96:177–189. doi: 10.1016/j.neuron.2017.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R79] 79.Zolnik TA, et al. All-optical functional synaptic connectivity mapping in acute brain slices using the calcium integrator CaMPARI. J Physiol. 2017;595:1465–1477. doi: 10.1113/JP273116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R80] 80.Scott SH. Optimal feedback control and the neural basis of volitional motor control. Nat Rev Neurosci. 2004;5:532. doi: 10.1038/nrn1427. [DOI] [PubMed] [Google Scholar]

[R81] 81.Krakauer JW, Ghazanfar AA, Gomez-Marin A, MacIver MA, Poeppel D. Neuroscience needs behavior: correcting a reductionist bias. Neuron. 2017;93:480–490. doi: 10.1016/j.neuron.2016.12.041. [DOI] [PubMed] [Google Scholar]

[R82] 82.Zylberberg J, Murphy JT, DeWeese MR. A Sparse Coding Model with Synaptically Local Plasticity and Spiking Neurons Can Account for the Diverse Shapes of V1 Simple Cell Receptive Fields. PLOS Comput Biol. 2011;7:e1002250. doi: 10.1371/journal.pcbi.1002250. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R83] 83.Chalk M, Tkačik G, Marre O. Inferring the function performed by a recurrent neural network. bioRxiv 598086. 2019 doi: 10.1101/598086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R84] 84.Cadieu CF, et al. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition. PLoS Comput Biol. 2014;10:e1003963. doi: 10.1371/journal.pcbi.1003963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R85] 85.Golub MD, et al. Learning by neural reassociation. Nat Neurosci. 2018;21:607–616. doi: 10.1038/s41593-018-0095-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R86] 86.Fukushima K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980;36:193–202. doi: 10.1007/BF00344251. [DOI] [PubMed] [Google Scholar]

[R87] 87.Vogels TP, Rajan K, Abbott LF. Neural Network Dynamics. Annu Rev Neurosci. 2005;28:357–376. doi: 10.1146/annurev.neuro.28.061604.135637. [DOI] [PubMed] [Google Scholar]

[R88] 88.Koren V, Denève S. Computational Account of Spontaneous Activity as a Signature of Predictive Coding. PLOS Comput Biol. 2017;13:e1005355. doi: 10.1371/journal.pcbi.1005355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R89] 89.Advani MS, Saxe AM. High-dimensional dynamics of generalization error in neural networks. ArXiv Prepr ArXiv171003667. 2017 doi: 10.1016/j.neunet.2020.08.022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R90] 90.Amit Y. Deep learning with asymmetric connections and Hebbian updates. Front Comput Neurosci. 2019;13 doi: 10.3389/fncom.2019.00018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R91] 91.Samadi A, Lillicrap TP, Tweed DB. Deep learning with dynamic spiking neurons and fixed feedback weights. Neural Comput. 2017;29:578–602. doi: 10.1162/NECO_a_00929. [DOI] [PubMed] [Google Scholar]

[R92] 92.Akrout M, Wilson C, Humphreys PC, Lillicrap T, Tweed D. Using Weight Mirrors to Improve Feedback Alignment. ArXiv Prepr ArXiv190405391. 2019 [Google Scholar]

[R93] 93.Lansdell B, Kording K. Spiking allows neurons to estimate their causal effect. bioRxiv 253351. 2018 [Google Scholar]

[R94] 94.Werfel J, Xie X, Seung HS. Learning curves for stochastic gradient descent in linear feedforward networks. 2004:1197–1204. doi: 10.1162/089976605774320539. [DOI] [PubMed] [Google Scholar]

[R95] 95.Bartunov S, et al. Assessing the scalability of biologically-motivated deep learning algorithms and architectures. 2018:9368–9378. [Google Scholar]

[R96] 96.MacKay DJ. Cambridge university press; 2003. Information theory, inference and learning algorithms. [Google Scholar]

[R97] 97.Goel V, Weng J, Poupart P. Unsupervised video object segmentation for deep reinforcement learning. 2018;5683:5694. [Google Scholar]

[R98] 98.LeCun Y, Bengio Y. Convolutional networks for images, speech, and time series. Handb Brain Theory Neural Netw. 1995;3361:1995. [Google Scholar]

[R99] 99.Chorowski JK, Bahdanau D, Serdyuk KD, Cho K, Bengio Y. Attention-based models for speech recognition; NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems; 2015. pp. 577–585. [Google Scholar]

[R100] 100.Houthooft R, et al. Vime: Variational information maximizing exploration. 2016;1109:1117. [Google Scholar]

PERMALINK

A deep learning framework for neuroscience

Blake A Richards

Timothy P Lillicrap

Philippe Beaudoin

Yoshua Bengio

Rafal Bogacz

Amelia Christensen

Claudia Clopath

Rui Ponte Costa

Archy de Berker

Surya Ganguli

Colleen J Gillon

Danijar Hafner

Adam Kepecs

Nikolaus Kriegeskorte

Peter Latham

Grace W Lindsay

Ken Miller

Richard Naud

Christopher C Pack

Panayiota Poirazi

Pieter Roelfsema

João Sacramento

Andrew Saxe

Benjamin Scellier

Anna Schapiro

Walter Senn

Greg Wayne

Daniel Yamins

Friedemann Zenke

Joel Zylberberg

Denis Therien

Konrad P Kording

Abstract

Introduction

Figure 1. The three core components of ANN design.

Box 1. Learning and the “credit assignment problem”.

Figure 2. Bias and variance in learning rules.

Constraining learning in artificial neural networks and the brain with “task sets”

Box 2. What are inductive biases?

The three core components of a deep learning framework for the brain

Box 3. Are there objective functions for brains?

Figure 3. Learning rules that don’t follow gradients.

Architectures, learning rules, and objective functions in the wet lab

Figure 4. Comparing deep ANN models and the brain.

Empirical studies of architecture in the brain

Empirical studies of learning rules in the brain

Figure 5. Biological models of credit assignment.

Empirical studies of objective functions in the brain

Caveats and concerns

Conclusion

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases