Skip to main content
. 2023 Mar 3;5(1):lqad021. doi: 10.1093/nargab/lqad021

Figure 1.

Figure 1.

Schematic of the data and model set-up. (Left) The Ensembl annotation (version 107) is used to determine transcript sequences and translation initiation sites (TISs). Transcripts are grouped by chromosome to create a training, validation and test set. (Right) The performer model allows processing of full transcript sequences, evaluating data through the layers in parallel to obtain model outputs at each position. The model architecture can handle varying input lengths, as identical model weights are applied to transform the data. Through self-attention, sequential information from any site on the transcript can be queried by the model to determine the presence of TISs at any position.