. Author manuscript; available in PMC: 2017 Aug 1.

Published in final edited form as: J Biomed Inform. 2016 May 13;62:21–31. doi: 10.1016/j.jbi.2016.05.004

Table 3.

Feature representation of each utterance in machine learning pipeline.

Feature Type	Description	Purpose
Lexical features	One feature per each distinct word in the set of training interview transcripts. The value of each lexical feature is the number of times that the corresponding word appears in the utterance.	To capture the vocabulary that is indicative of each label.
Contextual features	One feature per each codebook label. The value of the feature is set to 1 if the previous utterance in the dialog was annotated with the corresponding label, and to 0, otherwise.	Context changes the likelihood of observing speech acts. For example, if the previous speaker was requesting information, then the next speech act is more likely to be providing the requested information.
Semantic features	One feature per each of the sixty-eight LIWC lexicons. The value of each semantic feature is the number of times a word from the corresponding dictionary appears in the utterance.	To capture psycho-linguistic clues related to the thought processes, emotional states, intentions and motivations of the speaker.