The architecture of our neural network for multi-label electrocardiogram (ECG) classification. The input of the network is an ECG signal in the shape of (b × n × l), where b is the batch size, n is the number of sampling points, and l is the number of ECG leads. The input ECG is first processed by a residual neural network to extract a feature map. The feature map is then processed by an attention layer to extract feature vectors. The inputs of the attention layer include keys (K), values (V), and queries (Q). The attention layer outputs a feature vector (denoted by FVi) for each category (indexed with i). m is the number of categories that are queried. FVi is finally processed by a dedicated fully connected (FC) layer with sigmoid activation to generate the prediction for category i (denoted by pi).