Skip to main content
. 2018 Mar 22;18(Suppl 1):14. doi: 10.1186/s12911-018-0594-x

Table 1.

The statistical characteristics of the dataset

Characteristics Statistics
Total #pub. 1405
#pub. with author address information 1386
#pub. with abstract 1382
#pub. with author keywords or PubMed MeSH 1277
#unique publication sources 324
#unique countries/first countries 56/45
#unique authors/first authors 4391/1053
#unique affiliations/first affiliations 961/514
Average #words/word characters in title 12.53; 6.50
Average number/standard deviation of character in title 95.43; 29.72
Average #words/word characters in abstract 215.24; 5.62
Average number/standard deviation of character in abstract 1456.95; 536.2
Top 10 frequency words/phrases in author keywords or PubMed MeSH Electronic health record (363; 25.84%); Data mining (278; 19.79%); Information storage and retrieval (239; 17.01%); Artificial intelligence (179; 12.74%); Female (163; 11.60%); Semantics (156; 11.10%); Male (153; 10.89%); Controlled vocabulary (140; 9.96%); Automatic pattern recognition (127; 9.04%); Medical record system (112; 7.97%)
Top 10 frequency words/phrases extracted from title Electronic health record (69; 4.91%); Medical record (55; 3.91%); Clinical text (45; 3.20%); Clinical note (41; 2.92%); Patient (37; 2.63%); Text mining (23; 1.64%); Classification (22; 1.57%); Clinical narrative (21; 1.49%); Radiology report (21; 1.49%); Natural language processing method (20; 1.42%)
Top 10 frequency words/phrases extracted from abstract Patient (322; 22.92%); Precision (217; 15.44%); F-measure (205; 14.59%); Recall (178; 12.67%); Accuracy (164; 11.67%); Electronic health record (161; 11.46%); Natural language processing method (155; 11.03%); Medical record (143; 10.18%); Disease (141; 10.04%); Concept (128; 9.11%)