Table 2.
Evaluation of editorial and mark-up practices employed by main users of the three most widely used schemas (taxonX, TaxPub, taXMLit) in retrospective mark-up of historical taxonomic literature. Legend: “-” weak; “+” present, but needs further development; “++” good; “+++” very good.
Criteria | taxonX | TaxPub | taXMLit |
---|---|---|---|
1) What are the tolerances for text accuracy? | Generally high – structural mark-up generation is mostly independent of text, detail mark-up to some degree depends on text accuracy - text accuracy is checked during mark-up generation | n/a in prosepective publishing. | Text accuracy managed through pre-mark-up checking manually and through processes developed in ABLE project (Morse et al. 2009) |
2) What are the editorial policies for, among others: | |||
a) corrections/retention of typos and other errors in the text | Retained if in original publication | n/a in prospective publishing | Retained If in original publication; some corrections marked as implicit if unequivocal. |
b) interpretation of unclear text | What is “unclear text”? Abbreviated or omitted taxonomic epithets are disambiguated or filled in, respectively, during mark-up generation | n/a in prospective publishing | Use ‘implicit’ attribute for unequivocal clarification; if reliant on subjective interpretation not changed. |
c) choice of “copy-text”, i.e., the exemplar from which the digitized version of the text will be made. It is highly unlikely that every copy of any edition of a work will have exactly the same text | Probably not relevant to TaxonX – most documents marked are journals or journal articles, which extremely rarely have more than one edition | n/a in prospective publishing | Source copy text in large institutional libraries. Possibility for multiple copies of same work to be uploaded if marked up. Within texts treat cancels and cancellands separately. |
3) What are the policies and practices for normalization and other annotation, such as: | |||
a) expansion of abbreviations | Abbreviations get tagged, data they imply stored in DwC children of dedicated tax:xmldata element, but not widely used as of yet | n/a in prospective publishing | Use of ‘implicit’ attribute for unequivocal expansions (e.g. generic names, author names) |
b) normalization of taxon names, personal names, corporate names, etc. | Taxon names atomized, epithets expanded or filled in where abbreviated or missing, normalized epithets stored in DwC children of dedicated tax:xmldata elementNo normalization for person or corporate names as of yet | n/a in prospective publishing | Primarily reproduced as original; in some cases for both taxon names and person names, some normalization occurs in a separate part of the mark-up. Facility for linking synonyms of Parties / Agents outside text in place through INOTAXA. |
c) modernization of archaic or changed place names (e.g., Rhodesia/Zimbabwe) | None as of yet | n/a in prospective publishing | Primarily reproduced as original; also has ability to capture an ‘interpreted’ place name. Facility for searching forms of changed place names being developed in INOTAXA. |
d) annotation and other editorialization, as for example, correction of incorrect taxon names, assignment of coordinates to location names | Actual taxon name stays as in original publication, normalized epithets stored in DwC children of dedicated tax:xmldata element usually contain correct value | n/a in prospective publishing | Primarily reproduced as original; also has ability to capture corrections and additions as ‘interpreted’ data and, for added coordinates, using a ‘source’ attribute, |
4) What are the textual objects of interest which will be encoded (i.e., do not aim to tag everything). What is in scope, and what is not? What has the highest priority? | |||
treatments | +++ | +++ | +++ (high) |
keys | + | ++ | +++ (high) |
Phylogenetic and other trees | + | + | ++ (low) |
Front and back matter | - | +++ | +++ (medium) |
Discussion paragraphs | ++ | +++ | +++ (medium) |
Names | +++ | +++ | +++ (high) |
Specimen data | ++ | ++ | +++ (high) |
Taxonomic and nomenclatural acts | +++ | ++ | ++ (medium) |
bibliographies | ++ | +++ | +++ (medium) |
other | Front and Back Mattersection types, image legends, indexes | ||
5) What are the purposes of the mark-up? One just cannot “tag everything”, as no single encoding of a text is going to be equally suitable for all thinkable purposes. Three main categories can be seen as: | |||
a) rendition/representation of the text in HTML, PDF, ePub, or other formats | +++ | +++ | +++ |
b) archiving of the text for long term preservation | ++ | +++ | ++ |
c) analysis, data mining, and other processing | +++ | ++ | +++ |
6) What are the policies and practices for the handing of non-textual features such as illustrations, inserted plates, fold-out maps, etc.? | |||
a) how should multi-column text be handled? | Multi-column text normalized into single column | According to NLM publishing and archiving Tag Suite. | Currently most layout elements such as this are ignored unless columns numbered separately in original, in which case each column is treated as if it were a page. |
b) what are the policies and practices regarding overlapping hierarchies in the text (say, a significant section starts in one chapter and concludes in another chapter of a book)? | Not encountered so far, so no respective policy – most documents marked in TaxonX are journals or journal articles, which next to never exhibit overlapping hierarchies | n/a in prospective publishing | + treatments are dated from the first date of publication; supplements handled separately. |