Figure 4.
Our model for encoding and scoring language samples composed of utterances. Each utterance is encoded by a Bidirectional LSTM network. Utterance representations consisting of the concatenations of the first and last tokens are averaged into a vector that represents the entire language sample. From this language sample vector, a score is computed for the entire language sample.