Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Oct 14.
Published in final edited form as: Science. 2022 Oct 13;378(6616):141–143. doi: 10.1126/science.abq6851

Data sharing and community-engaged research

Maya Sabatello 1, Daphne O Martschenko 2, Mildred K Cho 2, Kyle B Brothers 3
PMCID: PMC10155868  NIHMSID: NIHMS1848729  PMID: 36227983

Research funders, professional associations, and scientific journals have increasingly endorsed positions and established policies that recognize data as a public good (2). In the health research context, the promise of data sharing is to accelerate health research across borders and improve patient care (3). But this raises ethical concerns for research participants, researchers, and marginalized communities, such as questions about data ownership, risk of reidentification (for example, genomic datasets), data security, and appropriate consent processes (2). Data sharing also raises unique ethical challenges for community-engaged research (CEnR), a term encompassing diverse research collaborations with communities, from community consultation to community-based participatory research. We discuss key rationales and goals of data sharing in health research and of CEnR and highlight three areas of potential tension between these two movements: incentives and benefit sharing; group harm and power structures; and researcher engagement and responsibility sharing.

This issue is particularly timely and important in light of the U.S. National Institutes of Health (NIH) updated Data Management and Sharing Policy, which will enter into force in January 2023 (“the Policy”)(1). The Policy “establishes the expectation for maximizing the appropriate sharing of scientific data generated from NIH-funded or conducted research,” with a preference for sharing through established repositories. The Policy’s shift from an aspirational culture of data sharing in scientific research to an established expectation to be documented in data sharing plans submitted with grant proposals (which may be perceived as a prerequisite for funding) creates new financial and non-financial incentives for researchers and research institutions. Though the policy galvanizes the growing trend of recognizing health data as a public good, this transformation in its current form risks producing an excessively narrow understanding of public good.

TWO SCIENTIFIC MOVEMENTS

The data sharing and CEnR movements are both grounded in efforts to improve scientific research and health outcomes for marginalized populations. However, they differ in their locus of control and vision for how and who is entrusted to promote such efforts.

The data sharing movement transfers scientific data from local/community ownership to global use and from individual researchers to institutionally-controlled environments and the larger community of researchers. It aims to create larger and more diverse datasets that are necessary for generating meaningful scientific findings (3), reducing biases, and enhancing scientific accuracy through data reuse, validation, and rigorous reproducibility (4). Data sharing further promotes the scientific enterprise by reducing redundancy in data collection and burden on research participants, creating training resources for emerging scientists, and spurring new research with existing scientific data (4).

The CEnR movement promotes scientific progress through reliance on local rather than global knowledge and power sharing instead of researcher-led, top-down approaches to research. A fundamental tenet of CEnR is that researchers’ expressions of responsibility to the community are inseparable from the study and the willingness of the community to contribute data for scientific endeavor. CEnR exists in various contexts (e.g., biodiversity studies), but is particularly illuminative in health research, where it intersects with laws, policies, and other commitments to promote equity and non-discrimination in health. It empowers communities to work alongside researchers from the very beginning of the research process. Together, communities and researchers in CEnR set research priorities and identify—and proactively address—ways researchers might cause harm to the community (5). CEnR is geared toward equitable, justice-oriented partnerships, including a co-learning process between scientific researchers and communities, attention to sociopolitical contexts and power structures, and potential for identifying opportunities for benefit sharing (5, 6). It is believed to enhance researchers’ cultural humility, awareness of unconscious biases, and understanding of diverse health-related belief systems; it also serves as a tool to build rapport between researchers and participants and increase the trustworthiness of research institutions (5).

The elements comprising CEnR center attention on community wellbeing, the potential harms and benefits accruing to a community, and on communities (not individuals) as the holders of rights and moral status. In contrast, the common practice of Institutional Review Boards (IRBs) and informed consent forms is to emphasize individual research participants. The purpose of CEnR to create trust relationships gives rise to ethical obligations beyond those of non-CEnR studies and must extend to secondary users of CEnR-collected health data to uphold the goals, processes, and outcomes of CEnR.

INCENTIVES AND BENEFIT SHARING

The underpinnings of CEnR—i.e., high trust relationships and trustworthiness of researchers—are neither legally based nor transactional in nature. Trust relations between researchers and communities cannot be circumscribed in consent forms; they cannot be sold, bought, or transferred. However, data sharing plans commonly focus on individual (not community-level) consent and commodify data that were collected based on community trust by allowing unrelated researchers to profit from them, regardless of community values and preferences. Some secondary data users may have even contributed to the health disparities experienced by marginalized populations, making them untrustworthy to these communities. Blind community members, for example, expressed low willingness to share health data with pharmaceutical companies (notwithstanding high interest in precision medicine research) because such companies commonly fail to assure that drug labels are provided in Braille or large print as required by law (7) – inaccessibility that has negatively impacted the health of blind people. The opportunity for secondary data users to utilize data that were collected via trusting relationships of CEnR obviates the need for secondary researchers to earn the community’s trust and removes any incentive for them to alter their discriminatory practices.

Data sharing might also undermine efforts to protect marginalized communities from exploitation through extension of research benefits. CEnR in resource poor settings, for example, has called for ensuring systemic and global-level benefit sharing as an ethical requirement of just research (8). Benefit sharing fulfills basic underpinnings of CEnR, including long-term, mutually beneficial partnerships, and transformative practices to redress past harms and existing power injustices (6). However, benefit sharing agreements that emerged in community-engaged studies are not immediately applicable to secondary data users; viable approaches for extending benefit sharing responsibilities to secondary data users are needed (8). Although adding such conditionality to open access repositories might complicate secondary data uses, it could facilitate changes towards socially responsible perceptions of the public good.

GROUP HARM, POWER STRUCTURES

The NIH Policy acknowledges that data sharing is likely to be varied and contextual. In respect and recognition of Tribal sovereignty, the Policy specifically notes the data protection interests of Tribal Nations and commits to develop, with community input, additional guidelines for researchers that are responsive to and highlight the unique values, preferences, and data sharing concerns of American Indian/Alaska Native (AI/AN) communities (1).

Secondary data uses raise distinctive concerns for other marginalized communities that are engaged in research. In addition to conventional harms such as data uses in research that communities view as objectionable (for example, gene editing) (9), researchers have fused scientifically unfounded and socially demeaning constructs into publicly available data to analyze and interpret them in ways that harm the interests of marginalized racial and ethnic communities (10). Such secondary uses of data collected through community engagement add insult to injury: they circumvent a fundamental purpose of CEnR to empower communities to work with researchers to set health research priorities, assure appropriate processes for their implementation, and prevent group harm, and thus abuse the trust underpinning CEnR. However, acceptable secondary uses of data and how they will benefit communities at large is not necessarily subject to IRB review. Without broad oversight of secondary data uses and the inclusion of communities in such oversight, data sharing will dilute responsibility for equitable research outcomes.

Secondary uses of data collected in CEnR also risk causing group harms, including dignitary harms. Researchers analyzing open data can reinterpret the data without attention to contexts or collapse the data into categories that deviate from community narratives and preferences. Aggregated data analyses are important for scientific findings to emerge (3). However, aggregate studies also objectify community members and remove their power to be actively involved in the research process—a key component of CEnR. Without bad intention, researchers may reinforce misperceptions of homogeneity, and invisibility, of marginalized populations—as has occurred, for example, in research with Asian American populations who are often treated as a single racial category. Inappropriate aggregation of data could also mask health disparities or other phenomena in subpopulations and thus reduce the potential for the data to be used to benefit those subgroups (e.g., disproportionately high COVID-19 mortality rates among Chinese Americans and Filipino nurses) (11).

Moreover, contrary to the rigidity of informed consent forms that serve as the basis for data sharing (including in studies that engage communities), CEnR involves many additional informal agreements that evolve over time, reflect community-tailored expectations for respectful research collaborations, and are fundamental for ethical, socially just and responsive research. For example, research engaging members of the Deaf community in the US has highlighted a preference for the use of identity-first (“deaf individuals”) rather than person-first (“person with deafness or hearing loss”) language (12). This preference accentuates shared history, culture, and experiences of being deaf, and challenges the medical model of disability that views being deaf as a deficit that requires treatment. While researchers engaged in CEnR with the Deaf community would be or become educated about and be trusted to follow such community preferences, secondary data users are not required to share or follow such an obligation. Publications based on shared data thus risk perpetuating dignitary harms to communities (by, for example, misnaming them or falsely reinforcing harmful stereotypes), even when communities agreed to contribute data based on trust that primary researchers will not cause these harms.

RESEARCHER ENGAGEMENT, RESPONSIBILITY SHARING

An underlying assumption of data sharing is that data de-identification reduces common risks of research participation. Yet, as secondary research using de-identified data is no longer considered human subjects research, the ethical obligations of secondary users to participants could be considerably weakened. For example, institutional policies on data sharing do not commonly include assurances that data uses are aimed at benefitting communities that participated in CEnR. Data de-identification may have the unintended effects of dehumanizing research participants. Although this outcome may apply to various forms of data collection, it is particularly problematic in CEnR, where by design, participants are not mere ‘research subjects’ but are ‘research partners’ who are assured that their voices and preferences will be heeded. This, in turn, breaks the chain of trust-related obligations that underpinned the data collection in CEnR, and enables secondary users to morally disengage from the context and communities involved (13). In contrast, CEnR encourages transparency and accountability toward research participants and communities throughout and after the study’s life cycle. Ethical data sharing thus requires secondary data users to be morally engaged while developing research questions, conducting analyses, and disseminating findings by understanding both the data and the people and communities who donated them. Put differently: data sharing must be accompanied by responsibility sharing.

Here, researchers conducting CEnR may find themselves caught in a Catch-22. In CEnR, it is expected, and important, that some of the scientific researchers are also members of the studied community. Wearing their scientific hats, failure to share data might result in reduced knowledge about and potential health benefits to marginalized populations. Yet as members of the studied community, researchers might share the community’s concerns over data sharing; even if not, they may be committed to a shared decision-making process—a common feature in CEnR(5)—and to upholding the ethical principle of solidarity (6). How to resolve such clashes of loyalties, especially in the absence of a legally recognized status as a community, has not yet been adequately considered.

POLICY IMPLICATIONS

The NIH Policy allows for “justified limitations or exemptions” to assure “appropriate” data sharing (1). Yet it primarily circumscribes these exemptions to existing regulations, including consent, and concerns about re-identification. Moreover, the Policy lays the burden of proof on the researchers/communities requesting limitations in how data will be shared without clarifying the evidence needed to secure such an exemption. Given the Policy’s preference for maximizing data sharing, there is a risk that exemption decisions will be biased to reflect dominant cultural views.

Assuring appropriate representation on decision-making bodies will be critical but raises questions about which populations are considered “a community” that might merit exemption. Inclusion of “recognized” marginalized communities might be insufficient. Asian Americans, for example, are not recognized by the NIH as an underrepresented community, despite substantial health disparities experienced by its sub-populations (11). The disability community—the largest but least recognized health disparities group in the US(14)—might not be seen by some as fitting conventional notions of “a community”, leading some to question who would be a legitimate representative to speak on its behalf. Notwithstanding heterogeneity within communities, it will be important to ensure that when requests for exemptions from data sharing requirements are being considered by funders, scientific journals, and other stakeholders, these processes consider the lived, shared, and intersectional experiences of the relevant communities and those contributing data as provided in data sharing plans—and celebrated in CEnR.

To address these tensions and ensure responsibility sharing to communities, the NIH could implement a mechanism for automatic exceptions to sharing data generated through CEnR when requested by communities. However, such an approach has faced challenges in implementation also with legally recognized Tribal Nations and communities. Despite the NIH’s acknowledgement of their data protection interests and explicit allowance of exemption, researchers, institutions, and journal editors may neither understand nor follow it (15). It is therefore even less likely to be endorsed for non-sovereign and even currently unrecognized communities. Other approaches could require training reviewers on data sharing plans and the wider research community to recognize community nuances and measures to ensure that secondary data users understand, and commit to uphold, their own responsibility to the communities who contributed data. These efforts could include community consultation on data sharing plans; community review of individual data sharing requests, including a community veto to data-sharing requests that raise concerns; raising the bar for standard “data use limitation statements” to assure benefit and responsibility sharing; requiring secondary data users to follow core formal and informal agreements reached with the community during the study, such as benefit sharing and culturally tailored, respectful communication of findings; and seeking feedback from the lead researchers in CEnR (who have first-hand knowledge and expertise on the community) on draft publications and other work products prior to their finalization. Fundamentally, it is critical to develop measures and processes to assure that data collected through trust relations in CEnR with marginalized communities are not used or disseminated in ways that are harmful to, or result in the re-stigmatization of, the community.

Community and public trust and accountability of secondary data users could also be enhanced by ensuring that all research databases under the Policy are documenting who is using data and for what purposes. Further, researchers requesting data derived from CEnR could be asked to detail how data uses intend to benefit communities and the public. Such transparency would allow NIH–and other entities involved in dissemination of scientific knowledge–to evaluate the extent to which the public good is being served by the data sharing policy.

There are key issues at stake: constructs of public good, justice and social responsibility in secondary data uses, and fairness in data sharing decisions. Research and multi-stakeholder engagement are needed to explore viable ethical, social and policy solutions to the dilemmas that arise. Without appropriate protections for communities engaged in research, a data sharing expectation is likely to negatively affect marginalized communities, efforts to diversify enrollment, and researchers conducting CEnR. The critical role of CEnR for inclusive and equitable research requires addressing the tensions between these two scientific movements—for communities and science.

Acknowledgement

This work was partially supported by NHGRI/NIH’s Office of the Director (OD) grant R01HG010868 (MS), NHGRI grant T32HG008953 (MKC, DOM) and NHGRI grant U24 HG010733 (MKC).

REFERENCES AND NOTES

RESOURCES