. 2021 Dec 13;17(12):e1009650. doi: 10.1371/journal.pcbi.1009650

Table 2. Summary of features for proportional ink violation detection.

No	Feature Description	Reason
1	The value of the lowest y-axis label on the y-axis (detected or inference from y-axis)	The lowest y-axis label should be zero
2	The increasing rate between each pair of y-axis labels	The scales of y-axis should be consistent across each pair of neighbor y-axis labels
3	If we need to inference the lowest text on the y-axis	If the lowest label on the y-axis is far from the x-axis, then we might ignore the actual lowest label on the y-axis
4	If the y-axis has a mix of integer and float number	Tesseract might not perform well with float number, and thus the increasing rate in the y-axis might not be accurate
5	The probability of being texts	We prefer texts with a higher probability of being texts
6	The OCR confidences of texts on the y-axis	We prefer predictions of the content of texts with a higher confidence
7	The probability of being bar charts	Our classifier only classifies bar charts. Thus, we prefer figures with a high probability of being bar charts