Skip to main content
. 2021 Dec 13;17(12):e1009650. doi: 10.1371/journal.pcbi.1009650

Table 2. Summary of features for proportional ink violation detection.

No Feature Description Reason
1 The value of the lowest y-axis label on the y-axis (detected or inference from y-axis) The lowest y-axis label should be zero
2 The increasing rate between each pair of y-axis labels The scales of y-axis should be consistent across each pair of neighbor y-axis labels
3 If we need to inference the lowest text on the y-axis If the lowest label on the y-axis is far from the x-axis, then we might ignore the actual lowest label on the y-axis
4 If the y-axis has a mix of integer and float number Tesseract might not perform well with float number, and thus the increasing rate in the y-axis might not be accurate
5 The probability of being texts We prefer texts with a higher probability of being texts
6 The OCR confidences of texts on the y-axis We prefer predictions of the content of texts with a higher confidence
7 The probability of being bar charts Our classifier only classifies bar charts. Thus, we prefer figures with a high probability of being bar charts