Table 4.
Difference | Auto-CORPus | PMC |
---|---|---|
Section titles | Section titles, subtitles, subsubtitles (and so on) are linked to the passage text they apply to | Section titles, subtitles, subsubtitles (and so on) precede the passage text they apply to |
Section types | Section types are annotated using IAO terms | Section types are described using custom labels |
Offset counts | Offset increased by 1 for every character (including whitespace) in a passage | Offset increased by the number of bytes in the text of a passage plus one space |
Table and figure sections | Structured table data are stored in table JSON. Figure captions are included in the BioC JSON in the sequential order in which they occur within paragraphs. | Table data and figure captions occur at the end of the JSON document. Table content is given as XML. |
Abbreviations section | Abbreviations section stored in abbreviations JSON. Abbreviation and definition components are related. Incomplete/one-sided definitions are not stored. | Abbreviations and definitions from the abbreviations section are stored separately as text with no relations between the two components. Incomplete/one-sided definitions are stored. |
Link anchor text | Link anchor text retained (HTML element tags removed). | Link anchor text removed. |
Character encoding | UTF-8 used for outputs | Available in Unicode and ASCII |