Skip to main content
. Author manuscript; available in PMC: 2019 Jul 1.
Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2018 Jul;2018:2225–2235.

Algorithm 1.

FEATURE EXTRACTION

1: procedure FORCED ALIGNMENT
2:  Determine time interval of each word
3: find wi ← → [Aij], j ∈ [1, L], i ∈ [1, N]
4: end procedure
5: procedure TEXT BRANCH
6:  Text Attention Module
7: for i ∈ [1, N] do
8:   TigetErnbedded(wi)
9:   t_hibi_GRU(Ti)
10:   t_eigetEnergies(t_hi)
11:   t_αigetDistribution(t_ei)
12: end for
13:  return t_hi, t_αi
14: end procedure
15: procedure AUDIO BRANCH
16: for i ∈ [1, N] do
17:   Frame-Level Attention Module
18:   for j ∈ [1, L] do
19:    f_hijbi_GRU (Aij)
20:    f_eijgetEnergies(f_hij)
21:    f_αijgetDistribution(f_eij)
22:   end for
23:   f_ViweightedSum(f_αij, f_hij)
24:   Word-Level Attention Module
25:   w_hibi_GRU(f_Vi)
26:   w_eigetEnergies(w_hi)
27:   w_αigetDistribution(w_ei)
28: end for
29: return w_hi, w_αi
30: end procedure