. 2025 Jul 1;28(3):357–376. doi: 10.1007/s10032-025-00543-9

Table 2.

Distribution of training and testing data for fine-tuning TSR and OCR models, highlighting the unique characteristics of each dataset

Dataset	Table structure recognition		Optical character recognition		Average cells per image in test set
Dataset	#Training Images	#Testing Images	#Train text lines	#Test text lines
UoS_Data_Rescue	1113	112	497045	97150	867.41
SROIE	1426	273	33626	18704	68.51
CORD	800	100	19367	2355	23.55
PubTabNet	6000 $^{*}$	15115	26000 $^{+}$	606719	40.14
ICDAR15	–	–	4468	2077	–

* The original PubTabNet dataset was released with 510K training samples

* Randomly selected 26000 text lines from the 6000 training samples