Algorithm 1.
1: Input: Dataset ; xi is the input sentence; yi is the corresponding label. |
2: The load pre-trained model tokenizes a sentence by splitting the sentence into words or subwords and then pads all lists to the same size. |
3: Use the distilBert model to train the dataset to obtain the embedding vector. |
4: Put the embedding vector into the logistic regression model to classify the dataset. |
5: Model evaluation. |