Skip to main content
. 2025 Aug 25;4:e66153. doi: 10.2196/66153

Table 1. Document types included in the Norwegian clinical corpus used for the continuous pretraining of NorDeClin-BERT-base (NorDeClin Bidirectional Encoder Representations from Transformers).

Document type Number of files Size
Anesthesia 46,310 94.8 MB
Treatment 29,919 49.3 MB
Discharge summaries 586,637 1.6 GB
Ergotherapy 33,220 38.4 MB
Pharmacy 3484 4.6 MB
Physiotherapy 69,324 80.4 MB
Individual plan 558 1.4 MB
Admission records 248,208 779,2 MB
Laboratory 66 53.8 kB
Surgery 313,795 446.8 MB
Summary records 5710 9.2 MB
Radiology 63,734 30.1 MB
Somatic care 110,248 211.3 MB
Nursing 299,212 220.7 MB
Training dataset (no duplicates) 1,670,464 3.2 GB