Skip to main content
. 2023 Aug 4;25:e48498. doi: 10.2196/48498

Table 1.

Recommendations for engaging community members in a community-based participatory research approach to natural language processing.


Raw data Data annotation Model selection Model training and testing Deployment and validation
Inform Provide information about potential data sources; describe the data source origination and curation Give a description of the annotation process and how it is used for natural language processing development Provide an overview of models being considered in the project Create tutorials and educational resources Describe translating natural language processing models into real-world settings, with implications on the potential risks, benefits, and impacts
Consult Meet with community members to elicit feedback on data source selection; discuss any questions or concerns related to the data source(s) Gather diverse views and thoughts on the annotation guidelines Ask community members about their perspectives on the models being considered Obtain feedback on the goals of the model (eg, interpretability) Gather input on perceived feasibility, utility, outcomes, and deployment strategies
Involve Identify meaningful data sources; discuss assumptions or concerns of each source Include community members in the development and refinement of annotation guidelines Discuss models and alternatives Engage community members in the model training process to ensure the model is training as intended Include community members in discussing considerations for equity and potential failures
Collaborate Consider community members as partners when selecting data sources through ongoing and open discussions Work together throughout the annotation process Partner with community members during model selection and weigh model tradeoffs together Jointly work with community members during model training with continuous discussions of goals and progress Work together during the predeployment testing, refinement, and deployment phases with ongoing discussions around safety and efficacy
Empower Provide the opportunity for community members to vote on data source decisions Promote shared decision-making Support community members in voting to select models best suited to the task Engage community members in setting priorities for model training and testing Allow community members to set goals and make decisions around model deployment and validation