Sensing Urban Transportation Events from Multi-Channel Social Signals with the Word2vec Fusion Model

. 2018 Nov 22;18(12):4093. doi: 10.3390/s18124093

Symbol	Description	Data Structure	Supporting Process
T	Number of Weibo posts	Int T	w-LDA
U	Number of Weibo users	Int U
K	Number of topics	Int K
V	Number of words in the vocabulary	Int V
V	Number of uers profiles	Int V
N_p	Number of words in p-th user profile	Int N[P]
$α$	K-dimensional prior weight vectors of topics in a document,	Float a[K]
$β$	V-dimensional vector prior weight of words in a topic	Float b[V]
$φ_{z}$	V-dimensional vector of probabilities, represents distribution of words in topic z	Double phi [Z][V]
$ϑ_{p}$	K-dimensional vector of probabilities, represents distribution of topics in user profile p	Double theta [P][K]
$z_{i}$	Identity of current topic of word $w_{i}$ in user profile $p_{i}$	Int 1…K
$w_{i}$	Identity of current word in user profile $p_{i}$	Int 1…V
$p_{i}$	Identity of current user profiles	Int 1…P
$n_{i, j}^{(p_{i})}$	Document-Topic matrix, the number of times topic j has been assigned to words in user profile $p_{i}$ .	int npt [P][K]
$n_{i, j}^{(w_{i})}$	Topic-Word matrix, Number of times that word $w_{i}$ has been assigned to topic j	int ntw [K][V]
W_i	Identity of current word vector (200 dimensions) trained by traffic word2vec	Double [200]	Similarity measure
$T_{w}^{(i = 1 \dots K)}$	Identity of current topic word cluster detected from Weibo	Double tw[K]
$T_{n}^{(j = 1 \dots K)}$	Identity of current topic word cluster detected from News	Double tn[K]
$W E_{w}^{(m, i)}$	Word embedding- cluster tensor, identity of the current word embedding in i-th cluster detected from Weibo	Double cew[K] [*W_i*]
$W E_{n}^{(n, j)}$	Word embedding- cluster tensor, identity of the current word embedding in i-th cluster detected from News	Double cen [K] [*W_i*]
$d i s_{R} (W E_{w}^{(m, i)}, W E_{n}^{(n, j)})$	Words similarity, measure the similarity between word embedding $W E_{w}^{(m, i)}$ and $W E_{n}^{(n, j)}$	Double wd
$D_{R} (T_{w}^{(i)}, T_{n}^{(j)})$	Topic similarity matrix, measure the distances between each words in the given topic cluster $T_{w}^{(i)}$ and $T_{n}^{(j)}$	Double td[K][K]	Event fusion
$μ_{R} (T_{w}^{(i)}, T_{n}^{(j)})$	Average shortest distance between $T_{w}^{(i)}$ and $T_{n}^{(j)}$	Double atd
$μ_{R}^{*} (T_{w}^{(i)}, T_{n}^{(j)})$	Normalized average shortest distance between $T_{w}^{(i)}$ and $T_{n}^{(j)}$	Double natd
$σ_{R}^{*} (T_{w}^{(i)}, T_{n}^{(j)})$	The standard deviation of normalized topic distances	Double sd