Skip to main content
. 2023 Dec 19;25:e44610. doi: 10.2196/44610

Table 4.

Data exclusion criteria.

Elimination rule Elimination definition Elimination method
Nontopic-related content Triglycerides were low, triglycerides were relatively low, triglycerides were not high; the detection value of triglycerides was lower than 1.7 mmol/L BERTa text classification model: manual output of training set, train model, verification by manual training verification set, elimination of residual data by model
Duplicate ID Duplicate page ID content Compare ID characters and reject duplicate IDs
Advertising content No description of patient’s personal illness, introduction of medical institutions, products, advertising links, invitation to join the group or join the consultation, the questioner is the organization Text recognition article with jump links, link address for advertising, delete or manually output the training set, and use event sequence template mining to build the model for recognition
Popular science articles Popular medical science, no description of patient’s condition Mining of event sequence template using the BERT text classification model: manual output of training set, train model, verification by manual training verification set, elimination of residual data by model

aBERT: bidirectional encoder representations from transformers.