Randomly initialize basic embedding matrix E, attention parameters ua, Wa, ba, RNN parameter θr, softmax parameters W, b. |
repeat
|
Update E with GloVe objective function (see Section 2.4) |
until convergence |
repeat
|
X ← random patient from dataset |
for visit Vt
in X do
|
for code ci
in
Vt
do
|
Refer to find ci’s ancestors C′
|
for code cj
in
C′
do
|
Calculate attention weight αij using Eq. (2). |
end for
|
Obtain final representation gi; using Eq. (1). |
end for
|
vt ← tanh(∑i:ci∈Vt gi) |
Make prediction using Eq. (4)
|
end for
|
Calculate prediction loss using Eq. (5)
|
Update parameters according to the gradient of
|
until convergence |