Table 1.
Comparison of annotation model features between LAPPS/LIF, INCEpTION, and PubAnnotation
Feature | LAPPS/LIF | INCEpTION | PubAnnotation |
---|---|---|---|
Annotations | LIF Annotations are JSON-LD objects that have the following properties: ID, type, label, start, end, features, and metadata. Metadata and features are both key-value maps. References between annotations are encoded as ID references. | UIMA annotations are feature structures which have the built-in properties: "sofa" (subject of analysis), "begin", "end". References between annotations (feature structures) are object references, so IDs are not required. | Triple representation serialized in JSON. The format is motivated by Resource Description Framework (RDF). |
Spans | Subtypes of "region" (can refer to multiple other regions [e.g., "markables"] to represent discontinuous spans) | Subtypes of "annotation". INCEpTION has no provisions for discontinuous annotations. | A denotation is a JSON object which connects a span (or a set of spans for discontinuous spans) to an object. |
Relations | Subtypes of "relation". The individual subtypes define the endpoints of the relation, e.g., "Dependency" defines a "governor" and "dependent". Relations are not necessarily binary. For example, constituent defines an optional parent as well as a list children. | Relations are annotations which have exactly two attributes that refer to other span annotations. For example, the dependency type defines the attributes "Governor" and "Dependent" which both point to "Token" annotations. Relations may have additional primitive attributes. There is no common supertype for all relation types. | A relation is a JSON object, which represents a typed, directed, binary relation, to connects two denotation objects. |
Chains | The "coreference" type. Links between the chain elements are not explicitly modelled and cannot be labeled. | Linked lists of spans where span and link can both have a label. | No dedicated annotation type for chains. However, a chain can be represented by a combination of denotations and relations. |
Attributes of annotation instances | Attributes are stored in the "features" map of the LIF JSON-LD object. | Attributes are fields in UIMA feature structures which are used to represent annotations | An attribute is a JSON object which resembles a relation, but it is meant to add further information to denotations and relations. |
Complex attributes | Attribute values are expected to be primitive, references to other annotations, or consist of nested feature sets. Sets and lists of references are supported. | Complex attribute values can be encoded as subtypes of "TOP". However, INCEpTION uses such complex attributes, e.g., to model argument slots on semantic predicates. | Complex attributes can be encoded using a naming convention. |
Multi-valued attributes | Unordered sets and ordered lists/arrays are supported. | UIMA supports multi-valued features (e.g., via arrays) and INCEpTION uses this internally in some cases. However, user-created features can presently not be multi-valued. | Instead of having multi-valued attributes, in PubAnnotation an attribute can be added multiple times with the same subject/predicate but different objects. This resembles a set behavior. |
Document level annotation | Features of "Document", plus those inherited from "Thing". | Subtypes of "AnnotationBase", e.g., "DocumentMetaData" | Attributes with the document itself at the subject position. |
LAPPS, Language Applications; LIF, LAPPS Grid Interchange Format; UIMA, Unstructured Information Management Architecture.