Algorithm 1.
Used to compute a probability score for a text document D given a masked language model M. The output of the model returned by a call to Forward is a matrix where each row maps to a distribution over all the tokens in the vocabulary. The Append function adds a value to the end of a list.
procedure Masked-Prob(D, M) |
sents ← Sentence-Split(D) |
P ← Initialize empty list |
for i = 1 … |sents| do |
T ← Tokenize(sents[i]) |
for j = 1 … 10 do |
A ← sample 15% from 1… |T| |
T′ ← T |
for all a ∈ A do |
T′[a] ← [MASK] |
outputs ← Forward(M, T′) |
for all a ∈ A do |
prob ← outputs[a][T[a]] |
Append(P, prob) |
return mean(P) |