Table 2.
Descriptive statistics of operative procedure text, preoperative diagnosis, and preferred terms.
Data set | Number of unique CPTa codes | Number of unique procedure texts | Number of tokens in procedure textb | Number of tokens in preoperative diagnosisb | Number of tokens in preferred termsb | ||||||
|
|
|
Mean (SD) | Range | Mean (SD) | Range | Mean (SD) | Range | |||
Training | 252 | 13,847 | 5.12 (3.57) | 1-60 | 4.12 (2.5) | 0-15 | 13.23 (6.44) | 3-46 | |||
Validation | 231 | 6012 | 5.15 (3.64) | 1-51 | 4.11 (2.5) | 0-13 | 13.23 (4.45) | 3-41 | |||
Holdout | 224 | 6731 | 4.98 (3.52) | 1-60 | 4.01 (2.52) | 0-13 | 13.24 (6.18) | 3-41 |
aCPT: current procedural terminology.
bThe unit of descriptive statistics is token (word).