Skip to main content
. 2025 Jul 1;28(3):357–376. doi: 10.1007/s10032-025-00543-9

Table 2.

Distribution of training and testing data for fine-tuning TSR and OCR models, highlighting the unique characteristics of each dataset

Dataset Table structure recognition Optical character recognition Average cells per image in test set
#Training Images #Testing Images #Train text lines #Test text lines
UoS_Data_Rescue 1113 112 497045 97150 867.41
SROIE 1426 273 33626 18704 68.51
CORD 800 100 19367 2355 23.55
PubTabNet 6000 15115 26000+ 606719 40.14
ICDAR15 4468 2077

* The original PubTabNet dataset was released with 510K training samples

* Randomly selected 26000 text lines from the 6000 training samples