Credit Assignment During Learning

Project

Overview

The human brain contains trillions of synapses. Through their patterns and strengths, these synapses are thought to encode our memories and acquired skills. How does the brain update its synapses to support behavioral learning without interfering with existing skills or memories? To study this question, we have developed i) optical connection-mapping techniques that leverage cellular-resolution two-photon (2P) optogenetics and calcium imaging in mouse motor cortex (MC) to track changes in the causal influence of each neuron in a recorded population; alongside ii) optical brain computer interface (BCI) learning tasks that explicitly define the causal relationship between imaged MC activity and behavior.

Optical Brain Computer Interface

To simplify the study of learning, we developed an optical BCI task in which mice control the position of a motorized reward port with a single neuron in layer 2/3 of primary motor cortex (Figure 1). Mice quickly learn to increase the activity of this conditioned neuron (CN), resulting in increased reward rates in approximately 30 trials (5 minutes; Figure 2). Additionally, the changes in activity that follow learning are remarkably sparse: only a small fraction of neurons change their activity as much as the CN (Figure 3).

Image with five parts showing BCI learning. a) A diagram of a BCI task where a conditioned neuron (CN) controls a reward port. b) A BCI experiment mapping CN activity to reward port position. c) CN activity aligning to the trial start after trial 30. d) A chart of hit rate versus trial number. e) A map showing DTuning with a magenta circle indicating the conditioned neuron.
Figure 1: BCI learning is fast and sparse. a. Schematic of BCI task. The activity of a single conditioned neuron (CN) controls the position of a motorized reward port. b. Preliminary BCI experiment illustrating the mapping of CN activity (top) to reward port position (bottom). c-d. Activity of CN during BCI learning. c. Single-trial CN activity. CN activity becomes aligned to trial start at around trial 30. d. Hit rate vs. trial #. e. Map of DTuning. Magenta circle, conditioned neuron.  

Learning rules

Activity changes in MC could result from either local circuit plasticity or plasticity in long-range inputs to MC. Our data demonstrate that BCI learning involves local circuit plasticity. What learning rules govern this plasticity? Two candidates are Error Backpropagation (Backprop), which leverages knowledge of the circuit's connectivity to optimally link inputs and outputs, and reward-modulated Hebbian-learning (3-factor learning) which finds input-output links through trial-and-error. To distinguish between these models we use two-photon optogenetics to map causal connectivitions between stimulated and non-stimulated neurons. Performing this all-optical “weight mapping” before and after BCI learning allows us to measure weight changes following learning.

Image with six parts showing learning models and neural responses. a) Upstream learning showing how inputs to MC drive activity changes. b-c) Local learning rules with reward feedback in 3-factor learning (b) and backprop feedback based on synapse contribution (c). d-f) All-optical weight mapping and network responses following targeted photostimulation of ten neurons.
Figure 2: a-c. Learning model schematics. a. Upstream learning. Upstream inputs into MC are altered to drive learning-related activity changes within MC. b-c. Local learning rules. Learning-related activity changes in MC are driven by plasticity in the local recurrent network. SLRs vary in terms of the precision of the feedback they receive. Reward feedback is provided to all neurons uniformly in 3-factor learning (b), whereas backprop provides each synapse feedback depending on its contribution to performance (c). d-f. All-optical weight mapping. e-f. Network response following targeted photostimulation of ten neurons (black circles).

Imaging neuromodulators during learning

Weight mapping reveals that synaptic plasticity in motor cortex is modulated by an RPE-like signal. Recent work has suggested that norepinephrine may convey RPE information to cortical circuits. To determine if norepinephrine provide an RPE signal to motor cortex, we perform functional imaging of NE axons originating from the Locus Coeruleus (LC) that express the genetically encoded calcium indicator GCaMP8s while mice learn the BCI task. In this experiment, activity in motor cortex is recorded using a red calcium indicator (jRGECO1a) which allows us to separate local somatic activity from LC axon activity (Figure 3). LC axons show activity that is strongly correlated with movements of the reward port, consistent with predictions from reinforcement learning models.

Dual-color image showing jRGECO1a expression in purple within neurons of the motor cortex and GCaMP8s expression as neon green lines in axons from the Locus Coeruleus.
Figure 3: Imaging neuromodulators during learning. Dual color image showing jRGECO1a expression in neurons in motor cortex and GCaMP8s expression in axons from Locus Coeruleus.  

Get Involved

Explore more projects from the Neural Dynamics team

Dynamic Routing

Explore project

Single-Cell Computation

Explore project