Table 1.
Recommendations for engaging community members in a community-based participatory research approach to natural language processing.
|
Raw data | Data annotation | Model selection | Model training and testing | Deployment and validation |
Inform | Provide information about potential data sources; describe the data source origination and curation | Give a description of the annotation process and how it is used for natural language processing development | Provide an overview of models being considered in the project | Create tutorials and educational resources | Describe translating natural language processing models into real-world settings, with implications on the potential risks, benefits, and impacts |
Consult | Meet with community members to elicit feedback on data source selection; discuss any questions or concerns related to the data source(s) | Gather diverse views and thoughts on the annotation guidelines | Ask community members about their perspectives on the models being considered | Obtain feedback on the goals of the model (eg, interpretability) | Gather input on perceived feasibility, utility, outcomes, and deployment strategies |
Involve | Identify meaningful data sources; discuss assumptions or concerns of each source | Include community members in the development and refinement of annotation guidelines | Discuss models and alternatives | Engage community members in the model training process to ensure the model is training as intended | Include community members in discussing considerations for equity and potential failures |
Collaborate | Consider community members as partners when selecting data sources through ongoing and open discussions | Work together throughout the annotation process | Partner with community members during model selection and weigh model tradeoffs together | Jointly work with community members during model training with continuous discussions of goals and progress | Work together during the predeployment testing, refinement, and deployment phases with ongoing discussions around safety and efficacy |
Empower | Provide the opportunity for community members to vote on data source decisions | Promote shared decision-making | Support community members in voting to select models best suited to the task | Engage community members in setting priorities for model training and testing | Allow community members to set goals and make decisions around model deployment and validation |