Skip to main content
. 2022 Dec 7;5(1):e37562. doi: 10.2196/37562

Table 1.

Statistics and experimental setting of the CMaiSpeech and NER data sets.

Methods CMaiSpeecha data set NERb data set CMaiSpeech data set

Training Validation Training Validation Testing
Wave statistics

Duration, hours 105.5334 1.6572 114.2523 12.4390 1.8093

Mean duration, seconds 14.2634 18.9395 21.6751 21.2028 17.5566

Utterances, n 26,636 315 18,976 2112 371
Word statistics, n

English words 136,196 2342 682 150 2427

Chinese words 787,701 12,115 1,420,101 206,909 14,820

Total words 923,897 14,457 1,420,784 207,059 17,247
Code-switching level

Language entropy 0.6033 0.6391 0.006 0.0086 0.5861

I-indexc, mean (SD) 0.2006 (0.1529) 0.1896 (0.16) 0.0006 (0.0052) 0.0011 (0.008) 0.1411 (0.1372)

aCMaiSpeech: China Medical University Hospital Artificial Intelligence Speech.

bNER: National Education Radio.

cI-index: integration index (ie, the probability of switching).