Skip to main content
. 2020 Sep 21;11:11. doi: 10.1186/s13326-020-00227-9

Table 1.

Dataset characteristics

Dataset name MedNLP Dummy-EHRs Pathology Reports
# of documents 50 reports 32 pairs of records and summaries 1000 reports
# of sentences 2244 8183 3012
# of tokens 42,621 154,132 194,449
# of all tags 490 3017 295
# of age tags 56 39 0
# of hospital tags 75 170 31
# of person tags 0 135 224
# of sex tags 4 16 0
# of time tags 355 2657 40
Example in original Japanese text 工場に勤めている<a > 64歳</a > の < x > 男性</x > 。 施設入所中で寝たきりの<a > 86歳</a > <x > 女性</x > 。全介助 <<院外標本 <h > 静大皮フ科クリニック</h > 、 < p > 桑田 智</p>
Example translated into English A < a > 64-year-old</a > <x > man</x > works in a factory An <a > 86-year-old</a > <x > woman</x > bedridden in a nursing home. Total assistance required <<Ex-hospital sample < h > Shizudai Dermatology Clinic</h > , < p > Satoshi Kuwata</p>