Table 4.
Performance evaluation of TrOCR-ctx model on Tabular Data Reconstruction across UoS_Data_Rescue, CORD, SROIE, and PubTabNet datasets. Precision and Recall for table structure recognition are calculated based on an IoU threshold 0.6
| Table structure recognition | Tabular data reconstruction | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | P | R | wF1 | Rouge-L | WER | CER | EM | F1 (Char) | F1 (Token) | |
| Without contextual information contextual information (TrOCR) | ||||||||||
| UoS_Data_Rescue | 0.742 | 0.919 | 0.805 | 0.771 | 0.281 | 0.254 | 0.719 | 0.819 | 0.719 | |
| CORD | 0.970 | 0.715 | 0.798 | 0.890 | 0.043 | 0.031 | 0.863 | 0.890 | 0.863 | |
| SROIE | 0.805 | 0.796 | 0.785 | 0.847 | 0.046 | 0.039 | 0.819 | 0.869 | 0.819 | |
| PubTabNet | 0.959 | 0.814 | 0.869 | 0.618 | 0.584 | 0.593 | 0.408 | 0.525 | 0.408 | |
| With contextual information contextual information (TrOCR-ctx without ByT5 model) | ||||||||||
| UoS_Data_Rescue | 0.742 | 0.919 | 0.805 | 0.778 | 0.258 | 0.232 | 0.742 | 0.824 () | 0.742 () | |
| CORD | 0.970 | 0.715 | 0.798 | 0.917 | 0.035 | 0.023 | 0.895 | 0.913 () | 0.895 () | |
| SROIE | 0.805 | 0.796 | 0.785 | 0.872 | 0.025 | 0.023 | 0.875 | 0.909 () | 0.875 () | |
| PubTabNet | 0.959 | 0.814 | 0.869 | 0.636 | 0.584 | 0.593 | 0.416 | 0.527 () | 0.416 () | |
| With contextual information contextual information (TrOCR-ctx with ByT5 model) | ||||||||||
| UoS_Data_Rescue | 0.742 | 0.919 | 0.805 | 0.809 | 0.245 | 0.213 | 0.755 | 0.850 () | 0.755 () | |
| CORD | 0.970 | 0.715 | 0.798 | 0.917 | 0.023 | 0.025 | 0.914 | 0.921 () | 0.914 () | |
| SROIE | 0.805 | 0.796 | 0.785 | 0.908 | 0.023 | 0.022 | 0.907 | 0.914 () | 0.907 () | |
| PubTabNet | 0.959 | 0.814 | 0.869 | 0.640 | 0.592 | 0.594 | 0.426 | 0.536 () | 0.426 () | |