Skip to main content

View full-text article in PMC

. 2015 Apr 2;6:15. doi: 10.1186/s13326-015-0011-7

Table 2.

Equations used in SEAM

C-value (a) [18]	$\{\begin{cases} l o g_{2} \|a\| \cdot f (a), \|α is not nested \\ l o g_{2} \|a\| (f (a) - \frac{1}{P (T_{α})} \sum_{b ϵ T_{α}} f (b)), \|otherwise \end{cases}$
	where:
	$α$ is the candidate string
	f(.) is its frequency of occurrence in the corpus
	Τ_a is the set of extracted candidate terms that contain a
	P(Τ_a) Is the number of these candidate terms
Termhood (a) $log (\frac{P (vote = yes)}{P (vote = no)})$ [53]	= −0.7836 +
	0.7541* FirstPOS _ ADJECTIVE –
	1.3722* FirstPOS _ ADVERB +
	0.3541* FirstPOS _ NOUN +
	1.4182 * FirstPOS _ VERB –
	0.7722 * LastPOS _ ADJECTIVE +
	2.2576 * LastPOS _ ADVERB +
	0.0285 * LastPOS_NOUN +
	0.6038 * LastPOS _ VERB +
	1.2899 * NP _ VALUE +
	1.0475 * REPEAT _ SUP _ GREATER _ MEDIAN +
	0.8417 * REPEAT _ SUB _ GREATER _ MEDIAN +
	0.8422 * DISTINCT _ PERHOST _ GREATER _ THAN _ MEDIAN
	where:
	POS is Part of Speech tag
	REPEAT_SUP is number of supra (candidate terms containing a) = P (Τ_a)
	REPEAT_SUB is subgroup (candidate terms that are contained within a) = P (Α_t)
	NP_VALUE is a a noun phrase
	DISTINCT_PER_HOST is equivalent to document frequency
	MEDIAN is calculated for the whole document set
TF-IDF = w_i,j = TF_i,j x IDF_i [43]	$T F_{i, j} = \frac{f_{i, j}}{m a x_{z} f_{z, j}}$
	where:
	TF_i,j is term frequency for keyword k_i in document d_j
	f_i,j is the number of times k_i appears in d_j
	max_zf_z,j is the maximum frequency across all keywords k_z in d_j
	$ID F_{i} = \log \frac{N}{n_{i}}$
	where:
	IDF_i is the inverse document frequency for keyword k_i
	N is the total number of documents in the corpus
	n_j is the number of documents that k_i appears in
Cosine similarity [43] $cosine (\vec{w_{c}}, \vec{w_{s}}) = \frac{\vec{w_{c}} \cdot \vec{w_{s}}}{\vec{w_{c}} \times \vec{w_{s}}}$	$= \frac{\sum_{i = 1}^{K} w_{i, c} w_{i, s}}{\sqrt{\sum_{i = 1}^{K} w_{i, c}^{2}} \sqrt{\sum_{i = 1}^{K} w_{i, s}^{2}}}$
	where
	w_i,j is defined above