Skip to main content
. 2022 Feb 15;4:788124. doi: 10.3389/fdgth.2022.788124

Table 4.

Differences between the Auto-CORPus BioC and PMC BioC JSON outputs.

Difference Auto-CORPus PMC
Section titles Section titles, subtitles, subsubtitles (and so on) are linked to the passage text they apply to Section titles, subtitles, subsubtitles (and so on) precede the passage text they apply to
Section types Section types are annotated using IAO terms Section types are described using custom labels
Offset counts Offset increased by 1 for every character (including whitespace) in a passage Offset increased by the number of bytes in the text of a passage plus one space
Table and figure sections Structured table data are stored in table JSON. Figure captions are included in the BioC JSON in the sequential order in which they occur within paragraphs. Table data and figure captions occur at the end of the JSON document. Table content is given as XML.
Abbreviations section Abbreviations section stored in abbreviations JSON. Abbreviation and definition components are related. Incomplete/one-sided definitions are not stored. Abbreviations and definitions from the abbreviations section are stored separately as text with no relations between the two components. Incomplete/one-sided definitions are stored.
Link anchor text Link anchor text retained (HTML element tags removed). Link anchor text removed.
Character encoding UTF-8 used for outputs Available in Unicode and ASCII