Skip to main content
. 2017 Aug 22;13(4):278–286. doi: 10.1007/s13181-017-0625-5

Table 1.

Selected terms associated with Big Data analyses

Term Description
Lemma The form of the word, without inflections, that would be found in the dictionary, for example “child” not “children”
Stopword A word with no intrinsic semantic value, for example, “a,” “the,” “of.” Additional words may be stopwords in one context but not another.
API Application Program Interface; method to allow programs to access the data of other programs without human interface
Ontology A formal description of the semantic relationship between words
GitHub repository An online cloud storage and code-sharing community.
Resource for open-source software
Semantic similarity A quantification of the similarity in meaning between two phrases
Twitter Streaming API An API that provides real-time access to tweets. As soon as a user emits a publicly available tweet, it becomes available to the Streaming API.
Semantic similarity matrix A two-dimensional grid where each square denotes the semantic similarity between two pieces of text (tweets in this context). Each square in the grid is specified by two co-ordinates, canonically called the ith and jth coordinates, counting from 0. For example, the lower right square of a 2 × 2 grid would be identified as 11.
Centroid Mean position of all points in a cluster, analogous to center of mass in physical objects.