Table 1.
Comparison of example data sources for children's conversational agents.
| Data source | Example references | Advantages | Disadvantages |
|---|---|---|---|
| Books | Children's Book Test (Hill et al., 2016) | • Easy to collect • Much data available • Available for social interactions and conversations at different times in history |
• Majority of content written as narrative to be read, not spoken conversation • Interpretation of children's conversations by adult content creators • May lack nuanced contemporary speech patterns |
| Films | MovieQA (Tapaswi et al., 2016) | • Easy to collect • Available for social interactions and conversations at different times in history • Content meant to be spoken conversation |
• Can perpetuate negative stereotypes from the film industry • Interpretation of children's conversations by adult content creators |
| Real-world interactions | Curiosity-evoking virtual agent (Paranjape et al., 2018) | • Available for social interactions and conversations at different times in history • Reflects children's actual speech in social interactions • Flexibility to include diverse representation • Control over tone and purpose • Content meant to be spoken conversation |
• Painstakingly time-intensive • Potentially costly • Pronunciation challenges of young populations |