Skip to main content
. 2018 Oct 8;2018:bay103. doi: 10.1093/database/bay103

Table 1.

Web services to enable the implementation of workflows for processing digital herbarium specimens (23, 24)

Category Subcategory Name Input Output Description
Image-based Object recognition Scale Matching Service GUID, SRI Scale region coordinates Template matching algorithm (uses an example image of the searched for scale to detect it in other images) (23)
Image-based DPI recognition DPI Service SRI resolution, SRI size, scale region coordinates DPI Computation of DPI using the physical size of the actual scale and the size of its digital counterpart on the herbarium sheet
Image-based Object recognition Text Region Service GUID, DPI Text region coordinates Line contrast approach (taking advantage of the fact that the horizontal contrast of text lines is very high and dark and bright areas are alternating quickly)/machine learning approach to detect text-like structures
Image-based OCR Tesseract/OmniPage Service GUID, text region coordinates Text Tesseract/OmniPage OCR algorithm
Text-based Dictionary-based Scientific Name Extractor Text Scientific names Parsing with dictionary based on Global Names (http://gnrd.globalnames.org/) and WikiData (24)
Text-based Dictionary-based Botanist Name Extractor Text Botanist names Parsing with dictionary based on botanists’ database of the Harvard University Herbaria & Libraries (http://kiki.huh.harvard.edu/databases)
Text-based Regular expression Date Extractor Text Dates Matching using regular expressions for dates related to collection, accession and determination
Text-based Regular expression Geographical Coordinates (GeoCoord) Extractor Text Coordinates Matching using regular expressions for geographical coordinates
Text-based Dictionary-based Location Extractor Text Locations Parsing based on Cartographic Location and Vicinity Indexer (CLAVIN) library (http://clavin.bericotechnologies.com)

GUID indicates Globally Unique Identifier; SRI, Scale Reference Image.