Figure 3.
The computation procedure of the traditional [CLS] strategy and the mean-pooling strategy. The squares in the two pictures are hidden states value (dark, medium, and light brown, respectively, represent hidden values of the [CLS] tag, triple, and [SEP] tag). The ovals with “Mean” refer to the weighted calculation of hidden values (as shown in Equation (9)). (a) The traditional [CLS] strategy simply takes the first line of the primary feature matrix, the hidden value of [CLS] tag as the output feature matrix. (b) The mean-pooling strategy first calculates the weighted value of every hidden dimension, and then constructs the output feature matrix by combining the weighted values.