Fig. 1.
Diagram of data collection, feature extraction, and analysis workflow for the current project. Records of activity were downloaded from GitHub for eight free and open‐source software projects. We extracted a series of features from each activity record, including basic metadata (i.e., the date it was created, the user who created it), communication context (i.e., whether someone was posting a PR, commenting on a PR, posting an issue, or commenting on an issue), community metrics (i.e., the project to which it belonged, whether the user was a member of the community or not), and language analysis (i.e., extracting a sentiment score and a count of gratitude words). Data were then subjected to a series of analyses, including comparing the overall behavior of groups, tracking of communities' language use over time, and quantifying factors related to newcomer retention.