Skip to main content
. Author manuscript; available in PMC: 2021 Nov 12.
Published in final edited form as: Nature. 2021 May 12;593(7858):249–254. doi: 10.1038/s41586-021-03506-2

Extended Data Fig. 1: Diagram of the RNN architecture.

Extended Data Fig. 1:

We used a two-layer gated recurrent unit (GRU) recurrent neural network architecture to convert sequences of neural firing rate vectors xt (which were temporally smoothed and binned at 20 ms) into sequences of character probability vectors yt and ‘new character’ probability scalars zt. The yt vectors describe the probability of each character being written at that moment in time, and the zt scalars go high whenever the RNN detects that T5 is beginning to write any new character. Note that the top RNN layer runs at a slower frequency than the bottom layer, which we found improved the speed of training by making it easier to hold information in memory for long time periods. Thus, the RNN outputs are updated only once every 100 ms.