Baseline experiment results. The results for phrase-based and section
header-based methods were obtained from all 50 articles, and the results for
linear SVM and BioBERT-based models were obtained using 5-fold cross-validation.
Best results for each CONSORT item are in bold. 3a: Trial Design; 3b: Changes to
Trial Design; 4a: Eligibility Criteria; 4b: Data Collection Setting; 5:
Interventions; 6a: Outcomes; 6b: Changes to Outcomes; 7a: Sample Size
Determination; 7b: Interim Analyses/ Stopping Guidelines; 8a: Sequence
Generation; 8b: Randomization Type; 9: Allocation Concealment; 10: Randomization
Implementation; 11a: Blinding Procedure; 11b: Similarity of Interventions; 12a:
Statistical Methods for Outcomes; 12b: Statistical Methods for Other Analyses;
Micro: Micro-averaging; Macro: Macro-averaging. AUC: Area Under Receiver
Operator Characteristic (ROC) Curve. In the last column, each letter indicates
that the results of one method is statistically significantly different from
those of another method at 95% confidence level, as measured by McNemar’s
test (a: phrase-based vs. section header-based; b: phrase-based vs. linear SVM;
c: phrase-based vs. BioBERT; d: section-header based vs. linear SVM; e:
section-header based vs. BioBERT; f: linear SVM vs. BioBERT).