Table 1.
Category | Subcategory | Name | Input | Output | Description |
---|---|---|---|---|---|
Image-based | Object recognition | Scale Matching Service | GUID, SRI | Scale region coordinates | Template matching algorithm (uses an example image of the searched for scale to detect it in other images) (23) |
Image-based | DPI recognition | DPI Service | SRI resolution, SRI size, scale region coordinates | DPI | Computation of DPI using the physical size of the actual scale and the size of its digital counterpart on the herbarium sheet |
Image-based | Object recognition | Text Region Service | GUID, DPI | Text region coordinates | Line contrast approach (taking advantage of the fact that the horizontal contrast of text lines is very high and dark and bright areas are alternating quickly)/machine learning approach to detect text-like structures |
Image-based | OCR | Tesseract/OmniPage Service | GUID, text region coordinates | Text | Tesseract/OmniPage OCR algorithm |
Text-based | Dictionary-based | Scientific Name Extractor | Text | Scientific names | Parsing with dictionary based on Global Names (http://gnrd.globalnames.org/) and WikiData (24) |
Text-based | Dictionary-based | Botanist Name Extractor | Text | Botanist names | Parsing with dictionary based on botanists’ database of the Harvard University Herbaria & Libraries (http://kiki.huh.harvard.edu/databases) |
Text-based | Regular expression | Date Extractor | Text | Dates | Matching using regular expressions for dates related to collection, accession and determination |
Text-based | Regular expression | Geographical Coordinates (GeoCoord) Extractor | Text | Coordinates | Matching using regular expressions for geographical coordinates |
Text-based | Dictionary-based | Location Extractor | Text | Locations | Parsing based on Cartographic Location and Vicinity Indexer (CLAVIN) library (http://clavin.bericotechnologies.com) |
GUID indicates Globally Unique Identifier; SRI, Scale Reference Image.