Table 1.
The key concepts in this paper and their definitions
Concept | Definition |
---|---|
Target vocabulary | The vocabulary to which we would like to suggest new terms. In this study, our target vocabulary is the CHV. |
N-gram | A contiguous sequence of n words in a sentence. In this study, we included up to 5-grams, since n-grams with n between 1 to 5 can cover over 99% of the terms of interest [33]. |
Seed term | An n-gram extracted from the text corpus that can be found in the target vocabulary (i.e., CHV). |
Candidate term | An n-gram that is not covered by the target vocabulary but could be potentially added to the target vocabulary. In order to qualify as a candidate term, an n- gram may be subject to some constraints, e.g., occurring more than 5 times in the corpus. |
Term context | Either the entire sentence that contains the term, or a window of 10 words before or after the term in its sentence. |