Table 1.
Term | Description |
---|---|
Lemma | The form of the word, without inflections, that would be found in the dictionary, for example “child” not “children” |
Stopword | A word with no intrinsic semantic value, for example, “a,” “the,” “of.” Additional words may be stopwords in one context but not another. |
API | Application Program Interface; method to allow programs to access the data of other programs without human interface |
Ontology | A formal description of the semantic relationship between words |
GitHub repository | An online cloud storage and code-sharing community. Resource for open-source software |
Semantic similarity | A quantification of the similarity in meaning between two phrases |
Twitter Streaming API | An API that provides real-time access to tweets. As soon as a user emits a publicly available tweet, it becomes available to the Streaming API. |
Semantic similarity matrix | A two-dimensional grid where each square denotes the semantic similarity between two pieces of text (tweets in this context). Each square in the grid is specified by two co-ordinates, canonically called the ith and jth coordinates, counting from 0. For example, the lower right square of a 2 × 2 grid would be identified as 11. |
Centroid | Mean position of all points in a cluster, analogous to center of mass in physical objects. |