Skip to main content

View full-text article in PMC

. 2022 Dec 7;5(1):e37562. doi: 10.2196/37562

Table 1.

Statistics and experimental setting of the CMaiSpeech and NER data sets.

Methods			CMaiSpeech^a data set				NER^b data set					CMaiSpeech data set
			Training		Validation		Training		Validation		Testing
Wave statistics
	Duration, hours	105.5334		1.6572		114.2523		12.4390		1.8093
	Mean duration, seconds	14.2634		18.9395		21.6751		21.2028		17.5566
	Utterances, n	26,636		315		18,976		2112		371
Word statistics, n
	English words	136,196		2342		682		150		2427
	Chinese words	787,701		12,115		1,420,101		206,909		14,820
	Total words	923,897		14,457		1,420,784		207,059		17,247
Code-switching level
	Language entropy	0.6033		0.6391		0.006		0.0086		0.5861
	I-index^c, mean (SD)	0.2006 (0.1529)		0.1896 (0.16)		0.0006 (0.0052)		0.0011 (0.008)		0.1411 (0.1372)

^aCMaiSpeech: China Medical University Hospital Artificial Intelligence Speech.

^bNER: National Education Radio.

^cI-index: integration index (ie, the probability of switching).