Abstract
Over the past 20 years, the National Institutes for Health (NIH) has implemented several policies designed to improve sharing of research data, such as the NIH public access policy for publications, NIH genomic data sharing policy, and National Cancer Institute (NCI) Cancer Moonshot public access and data sharing policy. In January 2023, a new NIH data sharing policy has gone into effect, requiring researchers to submit a Data Management and Sharing Plan in proposals for NIH funding.1,2 These policies are based on the idea that sharing data is a key component of the scientific method, as it enables the creation of larger data repositories that can lead to research questions that may not be possible in individual studies,3,4 allows enhanced collaboration, and maximizes the federal investment in research. Important questions that we must consider as data sharing is expanded are to whom do benefits of data sharing accrue and to whom do benefits not accrue? In an era of growing efforts to engage diverse communities in research, we must consider the impact of data sharing for all research participants and the communities that they represent.
We examine the issue of data sharing through a community-engaged research lens, informed by a long-standing partnership between community-engaged researchers and a key community health organization5. We contend that without effective community engagement and rich contextual knowledge, biases resulting from data sharing can remain unchecked. We provide several recommendations that would allow better community engagement related to data sharing to ensure both community and researcher understanding of the issues involved and move toward shared benefits. By identifying good models for evaluating the impact of data sharing on communities that contribute data, and then using those models systematically, we will advance the consideration of the community perspective and increase the likelihood of benefits for all.
Keywords: Data sharing, community-engaged research
At the turn of the 20th century, in The Philadelphia Negro,6 W.E.B. Du Bois contextualized the city’s health disparities within a bigger picture of the city’s political history, geographic mobility, and heterogeneity. While describing structural issues that Black people in Philadelphia faced, Du Bois outlined harms that specific Black communities in Philadelphia experienced within contexts of migration and class. Thus, the health disparities he described were not just individual problems to be solved, but part of a web of effects of the dehumanization of Black communities in the US, tied to generations of social and economic exclusion, visible in geographic patterns but explained by sociopolitical ones. As such, he interpreted these health disparities as preventable inequities, fixable not only through self-help and mutual aid in Black communities, but also through restructuring the ways that social and economic opportunities were unjustly distributed along racial lines. Importantly, this contextualization was influenced by Du Bois’ time spent living in the communities he wrote about. This approach contrasts with what Du Bois bemoaned as “car window sociology,” which drew inferences based on the passing impressions and the prejudices that form when researchers fail to engage with the people whose lives they attempt to explain.7 Science without community engagement and contextual knowledge lacks opportunities for biases and assumptions to be challenged, which was a key factor in the pseudoscientific racism Du Bois countered with his own research methodology. We use this community-engaged research lens to examine the role of data sharing in community-engaged research, and to raise our concern that without effective community engagement and rich contextual knowledge, biases can remain unchecked and uncorrected in health research today.
Over the past 20 years, the National Institutes for Health (NIH) has implemented a number of policies designed to improve the sharing of research data. The first NIH data sharing requirements were established in 2003, and since then a number of other mandated policies designed to promote access to research data and resulting findings have been adopted. These policies include the NIH public access policy for publications,8 NIH genomic data sharing policy,9 and National Cancer Institute (NCI) Cancer Moonshot public access and data sharing policy,10 among others. In January 2023, a new NIH data sharing policy went into effect, requiring researchers to submit a Data Management and Sharing Plan in proposals for NIH funding.1,2 The policy requires consideration of: (1) how data management and sharing is addressed in the informed consent process; (2) limitations on subsequent use of data; and (3) whether access to de-identified data should be controlled. The NIH policy instructs investigators to follow the FAIR guiding principles, which require that data be Findable, that is, to have unique and persistent identifiers, include machine-readable metadata, and be discoverable in a searchable resource. Data are to be Accessible, or open, free, and universally accessible. Data are to be Interoperable, that is, be able to be integrated with other data for analysis, storage, and processing. Finally, data are to be Reusable, or to have well-described data and metadata, and to use standards to enable replication and combination with other datasets.1,2
Data sharing policies are predicated on the idea that sharing data is an important component of the scientific method, as it enables the creation of larger data repositories to support innovative research questions that may not be possible in individual studies.3,4 Data sharing allows data to be utilized for new hypotheses that may extend beyond original plans for the data. It has also been argued that data sharing represents an ethical obligation, as it may maximize the learning that comes from federal investment in research11 and from the contributions of volunteer research participants who assume risks for the benefit of scientific discovery. Other benefits include enhanced opportunity for collaboration, the ability for data from different fields to be merged and/or used by those in other fields, increased sample sizes, and support for reproducibility and transparency.
One important question that we must consider as data sharing is expanded is: to whom do benefits of data sharing accrue? In the short-term, it is likely that the biggest beneficiaries of data sharing will be researchers, NIH, and industry. With time, the public will ideally benefit if data sharing realizes its potential to accelerate the efficiency of knowledge generation and translation to care. Of course, the actual benefits to the public will depend on whether new treatments or preventive strategies are widely accessible and whether any resulting cost savings are passed on to patients in terms of lower costs of evidence-based interventions. An equally important corollary is the question, and one that we will explore here: to whom do benefits not accrue?
A View of Data Sharing in the Context of Community Engaged Partnerships
In the last few years, NIH has placed a significant emphasis on community-engaged research in many of its funding mechanisms (e.g., Rapid Acceleration of Diagnostic Testing in Underserved Populations (RADx-UP);12 Community Engagement Alliance (CEAL) Against COVID-19 Disparities13), which we consider to be important advancements. However, it is also important to consider how these efforts are implemented, and whether data sharing policies align with community-engaged research principles. In this paper we examine the issue of data sharing through a community-engaged research lens and develop a series of recommendations to help ensure that the communities that participate in the generation of data receive benefit from the discoveries and knowledge generated through data sharing. We write from the perspective of a community-engaged partnership, the Implementation Science Center for Cancer Control Equity (ISCCCE), that brings together a group of community-engaged researchers and the Massachusetts League of Community Health Centers, a primary care association that supports 52 community health centers (CHCs) across Massachusetts. The partnership goals are to: (1) create an implementation science (IS) ecosystem across Massachusetts that engages in cancer prevention and control; (2) address health inequities by race, ethnicity, income, and geography through use of robust IS approaches; and (3) create capacity for addressing health equity. In response to the COVID-19 pandemic, the partnership also developed a response to the NIH call for proposals through its RADx-UP initiative and received funding for a two-year project. A requirement of funding was to share all data via mechanisms created by the RADx-UP Coordination and Data Collection Center. This experience provided the team with substantial opportunity to reflect upon the requirement to share data through the lens of the community’s experience in research.
In order to shape the present inquiry, we held meetings with members of the study partnership and our RADx-UP Community Advisory Board to gather multiple perspectives on sharing community-level data. The experiences of the Mass League informatics team were also discussed in multiple meetings with the research team at bi-weekly all-team meetings throughout the two-year grant to bring further understanding of their experiences. The author team of this paper includes members of the Mass League that have dealt with this issue over a long period of time.
Community Engaged Research Principles and the Implications for Data Sharing
Nine best practice principles have been articulated to guide community-engaged research,14 providing evidence-based and practical guidance for engaging community partners in research. The principles place particular attention on the need to engage communities affected by health issues (see Table 1). Further, these principles convey the importance of actively engaging community partners in leadership activities that help shape and drive the research. In particular, the principles emphasize the importance of every aspect of the research process recognizing community assets, autonomy, and self-determination.
Table 1.
Community-Engaged Research Principles13
| 1. Be clear about the purposes or goals of the engagement effort and the populations and/or communities you want to engage |
| 2. Become knowledgeable about the community’s culture, economic conditions, social networks, political and power structures, norms and values, demographic trends, history, and experience with efforts by outside groups to engage it in various programs. Learn about the community’s perceptions of those initiating the engagement activities |
| 3. Go to the community, establish relationships, build trust, work with the formal and informal leadership, and seek commitment from community organizations and leaders to create processes for mobilizing the community |
| 4. Remember and accept that collective self-determination is the responsibility and right of all people in a community. No external entity should assume it can bestow on a community the power to act in its own self-interest (Community empowers itself) |
| 5. Partnering with the community is necessary to create change and improve health |
| 6. All aspects of community engagement must recognize and respect the diversity of the community. Awareness of the various cultures of a community and other factors affecting diversity must be paramount in planning, designing, and implementing approaches to engaging a community |
| 7. Community engagement can only be sustained by identifying and mobilizing community assets and strengths and by developing the community’s capacity and resources to make decisions and take action |
| 8. Organizations that wish to engage a community as well as individuals seeking to effect change must be prepared to release control of actions or interventions to the community and be flexible enough to meet its changing needs |
| 9. Community collaboration requires long-term commitment by the engaging organization and its partners |
If these principles are considered as standards for research, they will require careful consideration of data sharing in the specific research context, particularly if the long-term goal of data sharing is to improve health. Data sharing requirements imply that investigators, and NIH as their funder, have a “right” to the data collected from the community. In community-engaged research, the data generated by the partnership isn’t owned by researchers. It is a community asset; just as aggregate health system data is an asset of the generating system. Communities have deep and substantiated concerns about efforts by others to assume ownership of and/or to determine the future use of their data for a number of reasons. First, as noted by Ballantyne,15 community-derived data can be used to characterize groups so as to confer disadvantage, leading to stigma and discrimination. Second, use of data by groups not involved in its generation can lead to both disenfranchisement and disempowerment, in which the participating communities lose control and agency over secondary uses of their own data. Further, data sharing can be considered exploitative when those involved in generating the data do not have a voice in its use or sufficiently benefit from its use.16 For example, data from a community health setting may be used to develop a new treatment for a common disease, but that treatment costs $50k/year and thus is inaccessible to many in the community. One could argue that insufficient benefit was generated for the community, and inequities were entrenched. Current data sharing principles assume that data are value-neutral and free of historical and social context, when in fact data are constructed through the choice of the research framework, methods, and instruments that often reflect institutional power and concretize inequity. A solution to this issue is to ensure that groups and communities who are contributing shared data have a voice in its use.
Potentially competing forces are at play when data sharing is a requirement for community partners to participate in research. The effect of such a requirement is to say, “you share your data, or you don’t participate,” a message delivered to the very same historically disadvantaged communities that we strive to engage in research. In the context of poor accrual of diverse participants to clinical trials, such messages are problematic. For example, a recent study of the FDA’s 5-year plan to improve diversity found no evidence of improved representation of Black participants in studies supporting new drug approval applications; less than 20% of drugs had treatment benefits or side effects data reported for Black patients.17 There can also be dissonance between the intent of a research mechanism and the ways in which funding requirements are implemented. For example, the RADx-UP initiative, which was designed to accelerate COVID-19 testing among underserved populations, initially planned to require that participants provide their social security number so that testing data could be used to link other data sources about participants. After significant concerns were raised by community partners and investigators, this requirement was changed to a voluntary request to provide this information. However, many community partnerships, including ours, did not consider such a request to be appropriate to make of vulnerable populations.
The strong policies and practices of First Nations communities can inspire more fair and equitable data sharing processes that address key ethical concerns and the historical context. As a result of a long history of research participation that did not meet community standards, First Nations in Canada have implemented specific research principles, centered around ownership, control, access, and possession (OCAP®), to ensure their self-determination and sovereignty.18 For example, ownership can reflect a First Nations community’s collective relationship to cultural knowledge, data, and information rather than an individual’s relationship to data shared with a research institution.16,19 This collective notion of ownership is in turn translated to data stewardship, in which an institution that is accountable to a particular First Nations community serves as the caretaker (and not the owner) of the data, and thus does not have authority to use data in a way other than directed by the community. Although the historical context may differ, the learning is applicable: health research institutions (and funders) should reorient their relationships with research participants and the communities that they represent towards equity.
Decades of research have demonstrated that scientific discoveries rarely flow to historically disadvantaged communities and, further, that discoveries typically drive further inequities.20,21 Given that communities are not sufficiently involved in research, and because community-derived research is rarely published, conclusions from the literature perpetuate the status quo and fail to identify associations unknown and unappreciated by the research community. Thus, it is problematic for data to be shared broadly without expectation that it will be used in ways that could realistically benefit the communities from which it has been generated. This creates a dilemma for researchers, particularly community-engaged investigators, who must comply with data sharing requirements.
Beyond Individualized Consent in the Community Context
Dominant conceptions of research ethics, as outlined in the Belmont Report,22 emphasize the values of respect for persons, beneficence, and justice at an individual level. Ethical research practices take place through interactions with individuals via informed consent and in consideration of the distribution of benefits and risks among individuals. This individualized concept of ethics may be helpful in guiding interactions with individual research participants; however, it does not address the shared nature of many types of community health data. Relatives share genetic information. Coworkers, classmates, and neighbors share environmental health exposures. People who share facets of their social identity (e.g., race/ethnicity, gender, immigration status) also share information on the political and social determinants of health. When health data are viewed through this more communal lens, the limits of an individual-level notion of research ethics come into question. When can one person feasibly consent to sharing health information as if they were the sole contributor or data owner? How meaningful are individualized research ethics in the study of factors that shape health at a community level?
Researchers using community-based participatory research (CBPR) approaches have wrestled with such questions as they seek to involve, in every step of the research process, people from the communities who share their personal data. CBPR researchers have identified ethical considerations like the breadth of institutional trust and the cultural specificity of consent to research,23 which go beyond the scope of standard research ethics informed by the Belmont Report. In sharing power and considering rights at a community level, CBPR researchers have reformulated data ownership and consent to be a dynamic long-term process, shedding light on the role that relationship-building can play in research, beyond ethics of individual protection and consent.24 CBPR further represents an opportunity to challenge the status quo, in that it requires viewing data through a community lens. The ethical challenges of research highlighted by CBPR25 include potential imbalances in benefits and risk between community insiders and outsiders, as well as questioning who has the right to represent communities in research partnerships. Ultimately, CBPR presents a field of health research that can inform recommendations for data sharing that addresses the community contribution and role.
Power Dynamics of Consent to Research in the Context of Health Inequities
The ethics of research consent are a growing concern in an era of increased data sharing and open science.26 Data sharing presents an opportunity to expand scientific knowledge that can advance health equity by enabling more research. It can also present an opportunity to combine existing data sets to help provide more statistical power to studies in populations that may not have large raw numbers in any single study. Further, open science and data sharing present an opportunity to promote more rigorous, reproducible research in important topics of health equity.
At the same time, data sharing and open science highlight ethical pitfalls in consent to research for populations experiencing health inequities. Through de-identification, consent for research use of biosamples or clinical data is effectively irrevocable, even if the research participants would object to use by certain actors or for specific purposes. The Pima Indian Diabetes Dataset (PIDD) sheds light on this kind of conflict between the principles guiding data sharing and the values guiding participation in research. The Gila River Indian Community reservation is home to Akimel O’otham (Pima) Tribe, whose history on the land can be traced back thousands of years. The Pima Tribe experienced a rapid and extreme rise in diabetes stemming from forced changes in environmental health, agricultural practices, and economic opportunities.27 Because of this high diabetes prevalence, the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) began studying diabetes in the Gila River Indian Community in the 1960s, with decades of research made possible by the collection of blood samples among reservation residents and promises of improved population health.28 However, health inequities ultimately remained while biomedical research carried on in a manner that the community found unacceptable.
In a 2009 ordinance concerning the process for research engagement, the Gila River Indian Community Council determined that medical research had been “conducted in ways that do not respect the human dignity of human subjects and that do not recognize the legitimate interest of the Community in the integrity and preservation of its culture”.27,29 Despite these objections, de-identified clinical data generated through these processes were still accessible online because NIH and the Johns Hopkins University researchers made the data available via the UC Irvine Machine Learning Repository. By 2017, the Pima Indian Diabetes Dataset (PIDD) was viewed nearly 260,000 times on the repository alone, fueling publications in a broad range of machine learning applications beyond diabetes and medicine.30–33 Without local contextual knowledge, this scale of research outputs may appear to be a success of data sharing. We would argue, however, that this situation highlights the ethical issues presented when data sharing principles fail to account for the reasons people might elect to participate in research—to help their own community and to support Indigenous sovereignty. Unfortunately, this is not an isolated example.34,35,36,37
Researchers in this example did not break any federal or institutional research ethics regulations. The data that they shared and reused for multiple studies were deidentified, and thus fell outside of the purview of human participant research protections.38 However, these ethics did not align with the distinct collective expectations and inherent rights that Indigenous Peoples hold, including the right to self-determination and collective consent,39 nor with their intentions of advancing their communal well-being through participation in research. As such, this case points to issues around the ethics of data sharing as a health equity issue in need of alternative models. Tribal Epidemiology Centers are among the institutions presenting alternative models that place, at their center, reciprocity, collective knowledge, and data sovereignty, as well as a focus on the interconnection between the rebuilding of Indigenous data and the rebuilding of Indigenous nations.40 While Tribal Epidemiology Centers play a particular role in the relationships between Tribal nations and the US federal government, they can act as a source of expertise and inspiration for other research institutions to reorient themselves toward data stewardship and dynamic consent, rather than data ownership and an effectively irrevocable consent process for de-identified data. We argue that similar issues apply to other communities which have experienced historical disadvantage due to social and economic exclusion.
The Limits of Data Sharing and Open Science Frameworks
In an effort to ensure maximal benefit from data sharing while addressing concerns about potential harmful impacts, there have been a number of guiding frameworks established. The National Academies of Sciences, Engineering, and Medicine’s framework of Open Science by Design41 outlines key principles and practices to broaden access to scientific knowledge for every stage of research. While focused on accessibility and transparency, operationalizing these open science principles place at the center researchers with institutional access to resources including the software, hardware, and time required to interact with data sets, reproduce statistical analyses, and interpret results. Freely sharing academic papers, preprints, and protocols relies on an assumption of readers’ technical knowledge and content expertise to critically appraise them, niche skills that typically require years of higher education to develop. In this way, values of accessibility and transparency without specific attention to community access and understanding may act as barriers to community empowerment and equity.
These issues are not unique to the context of the United States, but rather are manifestations of broader concerns. For example, the Global Alliance for Genomics & Health’s (GA4GH) Framework for Responsible Sharing of Genomic and Health-Related Data42 aims to protect and promote the interests of individuals through a series of principles for evaluating and practicing responsible research. This framework prioritizes individual human rights and does not address the issues of data ownership and consent in community contexts as outlined above. The GA4GH has also published a Framework for Involving and Engaging Participants, Patients, and Publics in Genomics Research and Health Implementation.43 In this framework, the GA4GH places study participants, patients, and members of the public at its center. It highlights several benefits of community and participant engagement, including trust in science, collaboration across fields, diversity of participants, relevance of genomic research to the public, improved quality/quantity of data, and identification of positive and negative impacts of sharing genomic information. The framework offers useful tools for reflecting on engagement practices and working towards fairer, more inclusive processes. However, this framework still places researchers as primary drivers of engagement and thus does not address concerns related to community empowerment. Separation of data sharing and engagement processes presents a barrier to community empowerment and equity in data sharing in the US context.44
The CARE Principles for Indigenous Data Governance, which have been put forward by Indigenous Data Sovereignty networks in Aotearoa New Zealand, Australia, and the United States, go considerably further in promoting equitable participation in data sharing and data re-use.45 The four core principles include collective benefit from the data and its governance; authority to control the data, recognizing rights and interests in data governance; responsibility for positive relationships, expanding capacity, and for Indigenous worldviews; and ethics for justice, beneficence, and future use. CARE principles for data sharing shift the focus from regulated consultation to value-based relationships based within Indigenous cultures and knowledge systems that benefit the populations whose data are shared.46,47
Research Partnerships with Community Health Centers—Implications for Data Sharing
The attention to increasing diversity in research participation has created significant interest among the research community in partnerships with community health centers (CHCs). While under-resourced, federally qualified CHCs are a major provider of health care services in the US,48,49 serving an estimated 30 million patients, including more than one in five uninsured people and one in five Medicaid recipients.50 The patient population of CHCs overall are very diverse in terms of race/ethnicity, typically serving residents of the communities in which they are located. With federal support, CHCs in the US have robust electronic health records (EHRs) and data reporting capabilities. CHCs are therefore a rich and diverse source of patient-level data that have attracted interest among academic researchers. CHCs enter research partnerships with their own data and with their mission-driven focus to provide the best care to their community. Research partnerships can provide financial resources and essential knowledge generation that can improve patient lives.
However, there are significant concerns among many CHC leaders who are wary about the impact of data sharing requirements on the communities they serve. From the CHC perspective, the decision to share data, even if required by a funding mechanism, reflects directly on the CHC and says something to patients about the CHC’s values and how it views them. For example, requesting that research participants voluntarily provide social security numbers in the context of research, as has been done in recent NIH initiatives,12 raises significant concerns for many patients as a signal that the CHC is inattentive to patient privacy, instrumentalizing their participation and threatening their trust. CHC leaders are also concerned about data use that may harm the populations they serve, such as bias in medical technology design and algorithms that are not race-neutral.51,52 CHCs’ experience in data access requests have heightened concerns about data sharing: often all available data or the entire data dictionary are requested by researchers, rather than a curated list of variables derived from a specific set of research questions. This raises concerns about identifying potentially spurious relationships that are not hypothesis-driven and that have the potential to stigmatize and further harm the community. In addition, CHC leaders worry about inaccurate conclusions that might be drawn about quality of care delivered if they “hand over” their data without knowing that data users understand the nuance of important context, which can lead to potential spurious conclusions. CHC leaders should question how data sharing will benefit their community: data have power and value. There should be at least equivalent benefit to the community as there is for the broader research community. Further, the costs of data sharing should be recognized and addressed. The NIH’s expectation of data sharing does not seem to consider the burden placed on CHCs to deliver that data. It is expected that researchers will provide the needed resources, but rarely is the CHC’s cost fully recovered.
Another major issue for CHCs is that state and federal agencies require different data elements to reflect similar measures. Data sharing would be greatly facilitated, and costs and burden significantly decreased, by harmonization of data elements across state and federal agencies. For example, when the recent focus on social needs assessment began, many CHCs built measures into the EHRs, specifically the PRAPARE measure which was developed for the CHC setting.53 But since that time, social needs measures have proliferated, and different agencies and accountable care organizations require different measures. It has created considerable duplication and added effort. The advantages of having a single measure are clear from the perspective of data sharing, but when it is the CHC that is left trying to harmonize across measures for several agencies, it is untenable for the CHC and diminishes the stated intention of further reuse that often depends upon interoperability.
Recommendations to Improve Community Benefit from Data Sharing Requirements
Based on our experience and the body of work reviewed above, we offer several recommendations that we believe could increase the benefit that both communities and science receive from sharing of data that is generated through community-engaged research.
- Develop Better Models that Support Community Engagement
- NIH should create a community-led task force to develop models of data sharing that draw on the outstanding work conducted by Tribal Epidemiology Centers40 and the CARE Principles,45 in order to reorient research institutions themselves toward data stewardship, dynamic consent, and approaches to engagement that ensure broad benefit to the contributing communities. This is particularly important for communities that have experienced historical disadvantage due to social and/or economic exclusion, as is reflected in Robert Wood Johnson Foundation’s recent data commission report that notes the importance of new approaches to ensure equity in data systems.54
- New models should include needed infrastructure support for community organizations that share data, rather than expecting them to do this with their own resources or to rely on researchers to provide sufficient support.
- Serious consideration should be given to using controlled access environments for community-level data. This would ensure that individual data are not distributed but would allow results to be downloaded. Such an environment should be created by repositories that can ensure, at a minimum, community access to results, and optimally community engagement in design, data governance, and data re-use.
- NIH and other funders should support community-led research. The recently released ComPASS program is a model for future and/or expanded mechanisms. It will also be important to develop training resources for community-based data scientists, epidemiologists, and health economists who can use and benefit from data shared.
- NIH and other funders should take a broader and more inclusive approach to creating common data elements—not just those used for research, but also those used in care delivery—through collaboration with federal and state agencies. The onus should not be on community partners to ask the same questions multiple ways or to attempt to harmonize different measures, but rather on the government to create data elements that are agreed upon and standardized.
- Develop Community-Informed, Simplified, and Collaborative Data Use Agreements (DUAs)
- Research institutions should develop plain language data use agreements (DUAs) in collaboration with community partners (e.g., Community Advisory Boards). Standard DUAs initiated through academic institutional sponsored programs offices typically lack language that aligns with community-engaged research principles and fail to recognize the data provider as partners in the research.
- The burden associated with extensive legal language in DUAs could be reduced by engaging a trusted third party that can act as “data provider” for the individual healthcare organizations participating in research. The third-party intermediary would be responsible for executing a formal DUA with parties requesting data, while allowing for simplified data use approval agreements at the individual site level. The intermediary would outline key information, including plans for data use, storage, management, sharing, and destruction, and act as a clearinghouse to ensure fair access to and stewardship of contributed data.
- Ensure Community Understanding of Data Use and Outcomes
- Resources should be developed that ensure community understanding of, and provide support for different levels of sharing, as suggested by CDC.55 Ideally, such resources should be co-created by community members, written in lay language, provided in multiple languages, and use different communication vehicles, such as video, audio, and simple graphics.
- Consent processes should be responsive to local cultural contexts, and move toward consent for data sharing as revocable, ongoing, and specific, even for de-identified data, at the individual and community level.
- NIH and other funders should require that findings from all research conducted using a shared data resource are returned to that community in a manner that is understandable and, if possible, actionable.
- Ensure Researcher Understanding
- To ensure that users of shared data resources have sufficient context for interpreting their findings, funders should develop guidance on metadata to be provided in data repositories. Investigators using shared data resources could be encouraged or even required to provide consulting resources to allow for community representatives to participate in subsequent analyses of shared data, to provide context, to have input into the scientific question, and to ensure understanding of any findings.
- The FAIR data sharing principles should be expanded to include key principles of engaged research, which we describe as “FAIR-ER”.
- Track Data Use
- Funders should require that all proposed research involving a shared data resource is registered on a publicly available resource in advance of accessing the data. All papers published based on a shared data resource should include tracking information so that the data source is clear.
- Ensure Benefit Sharing
- Acknowledging that financial gain from research is generally derived from a multiplicity of sources and results, investigators who use community-generated data resources should be required to ensure that the community shares in the benefit. Any intellectual property patents filed should also be disclosed to the community. Benefit sharing could take a number of forms, such as providing derived medicines, diagnostics, treatments, or prevention measures to members of the participating communities at no or decreased cost; a percentage, scaled to the degree of contribution, of realized revenue or profit for a defined number of years; and training opportunities or workforce development grants. If there are practical implications of findings, the participating communities could be preferred sites for early adoption and receive assistance for doing so. Funders should develop guidance for resource sharing that acknowledges all of the data sources, including the communities involved in the original research study, and strategies to share those resources appropriately. The key is to develop ways for the financial benefit of data sharing to be established and shared with communities that participated in data generation. Providing the community with a voice in how they wish to receive benefits would most likely maximize the impact of these resources.
- Develop models for Evaluating Impact
- Mechanisms should be established to determine models for evaluating community impact, including ways to gather community perspectives. Such an effort might consider whether different access models (e.g., open or controlled access) have differential benefits to those outside of the research community. For example, do controlled-access models lead to greater consideration of community-level impact of the research, increase the likelihood of return of results to data contributing communities, or increase the chances of community-relevant research questions being pursued?
Conclusions
The NIH has implemented a number of policies designed to improve the sharing of research data over several decades. As of January 2023, the NIH’s data sharing policy requires researchers to submit a Data Management and Sharing Plan in NIH grant proposals, and thus to be more thoughtful in their practices related to data sharing. NIH recommends use of the FAIR principles that will ensure that data is Findable, Accessible, Interoperable, and Reusable.1,2 We recommend that principles of Engaged Research— what we term as “FAIR-ER” data sharing—should be applied to the design, implementation, and evaluation of data-sharing policies. Given the historical context of prior experiences in the sharing of community-derived data, consideration of community-engaged research principles is foundational to ongoing efforts to establish a culture of data sharing in—and with—the community.
It is also important to address who benefits from data sharing, and who does not. There is potential for considerable good to come from data sharing, but also significant potential for harm that go beyond the common consideration of re-identification, including de-contextualization and misinterpretation of data and resulting findings, and disenfranchisement of participating communities. We provide several recommendations that would allow better community engagement related to data sharing, building upon model examples such as the Tribal Epidemiology Centers and NIH’s new efforts to support community-led research, to ensure better understanding of the complex issues by both community and researchers, and to move toward shared benefits. Without effective community engagement and rich contextual knowledge, biases resulting from data sharing can remain unchecked and uncorrected. By identifying good models for evaluating the impact of data sharing on communities that contribute data, and then applying those models systematically, we will honor the community perspective and increase the likelihood of benefits for all.
Highlights:
Data sharing policies should consider to whom benefits do and do not accrue
Community Engaged Research Principles would increase community benefit
Funders should develop mechanisms to ensure community benefit from data sharing
Funders should track impact of data sharing on community-relevant outcomes
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Credit Author Statement
Karen M. Emmons: Conceptualization, Writing- original draft preparation; Samuel Mendez: Conceptualization, Writing – original draft preparation; Rebekka M. Lee: Conceptualization, Writing – original draft preparation; Diana Erani: Conceptualization, Writing – Reviewing and Editing; Lynette Mascioli: Conceptualization, Writing – Reviewing and Editing; Marlene Abreu: Conceptualization, Writing – Reviewing and Editing; Susan Adams: Conceptualization, Writing – Reviewing and Editing; James Daly: Writing- Reviewing and Editing; Barbara E. Bierer: Writing – Reviewing and Editing.
References
- 1.NIH. Supplemental information to the NIH policy for data management and sharing: Selecting a repository for data resulting from NIH-sponsored research. 2020b; https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013, 2022.
- 2.NIH. Final policy for data management and sharing. 2020a.; https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.
- 3.Alter G, Gonzalez R. Responsible practices for data sharing. Am Psychol. 2018;73(2):146–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Jwa AS, Poldrack RA. The spectrum of data sharing policies in neuroimaging data repositories. Hum Brain Mapp. 2022;43(8):2707–2721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kruse GR, Pelton-Cairns L, Taveras EM, et al. Implementing expanded COVID-19 testing in Massachusetts community health centers through community partnerships: Protocol for an interrupted time series and stepped wedge study design. Contemp Clin Trials. 2022;118:106783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Du Bois W. The Philadelphia Negro: A Social Study. Philadelphia PA: University of Pennsylvania Press; 1996. [Google Scholar]
- 7.Du Bois W. The Souls of Black Folk : Essays and Sketches. Boston: University of Massachusetts Press; 2018. [Google Scholar]
- 8.NIH. NIH Public Access Policy. https://www.nih.gov/health-information/nih-clinical-research-trials-you/what-is-nih-public-access-policy#:~:text=The%20Public%20Access%20Policy%20ensures,.gov%2Fpmc%2F). Accessed November 14, 2022.
- 9.NIH. NIH Genomic Data Sharing Policy. https://osp.od.nih.gov/scientific-sharing/genomic-data-sharing/. Accessed November 14, 2022.
- 10.NCI. NCI Cancer Moonshot Public Access and Data Sharing Policy. https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative/funding/public-access-policy. Accessed November 14, 2022.
- 11.Brakewood B, Poldrack RA. The ethics of secondary data analysis: considering the application of Belmont principles to the sharing of neuroimaging data. Neuroimage. 2013;82:671–676. [DOI] [PubMed] [Google Scholar]
- 12.NIH. Rapid Acceleration of Diagnostics. https://www.nih.gov/research-training/medical-research-initiatives/radx/radx-programs. Accessed November 14, 2022.
- 13.NIH. NIH Community Engagement Alliance (CEAL). https://covid19community.nih.gov/. Accessed November 14, 2022.
- 14.NCATS. Principles of Community Engagement. The National Center on Advancing Translational Science;2011. [Google Scholar]
- 15.Ballantyne A. How should we think about clinical data ownership? Journal of Medical Ethics. 2020;46(5):289–294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Schnarch B. Ownership, Control, Access, and Possession (OCAP) or Self-Determination Applied to Research: A Critical Analysis of Contemporary First Nations Research and Some Options for First Nations Communities. J of Aboriginal Health. 2004;1(1):80–95. [Google Scholar]
- 17.Green AK, Trivedi N, Hsu JJ, Yu NL, Bach PB, Chimonas S. Despite The FDA’s Five-Year Plan, Black Patients Remain Inadequately Represented In Clinical Trials For Drugs. Health Aff (Millwood). 2022;41(3):368–374. [DOI] [PubMed] [Google Scholar]
- 18.Mulder C, Debassige D, Gustafson M, Slater M, Eshkawkogan E, Walker JD. Research, Sovereignty and Action: Lessons from a First Nations-Led Study on Aging in Ontario. Healthc Q. 2022;24(Sp):93–97. [DOI] [PubMed] [Google Scholar]
- 19.James RTR, Sahota P, et al. for the Kiana Group. Exploring pathways to trust: a tribal perspective on data sharing. Genetics in Medicine. 2014;16(11). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tehranifar P, Goyal A, Phelan JC, et al. Age at cancer diagnosis, amenability to medical interventions, and racial/ethnic disparities in cancer mortality. Cancer Causes Control. 2016;27(4):553–560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tehranifar P, Neugut AI, Phelan JC, et al. Medical advances and racial/ethnic disparities in cancer survival. Cancer Epidemiol Biomarkers Prev. 2009;18(10):2701–2708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.THE BELMONT REPORT: Ethical Principles and Guidelines for the Protection of Human Subjects of Research. 1979. [PubMed]
- 23.Shore N. Re-Conceptualizing the Belmont Report. Journal of Community Practice. 2006;14(4):5–26. [Google Scholar]
- 24.Banks S, Armstrong A, Carter K, et al. Everyday ethics in community-based participatory research. Contemporary Social Science. 2013;8(3):263–277. [Google Scholar]
- 25.Wilson E, Kenny A, Dickson-Swift V. Ethical Challenges in Community-Based Participatory Research: A Scoping Review. Qual Health Res. 2018;28(2):189–199. [DOI] [PubMed] [Google Scholar]
- 26.Shore CHJ, Khandekar E, Wizemann T (Eds). Reflections on Sharing Clinical Trial Data: Challenges and a Way Forward: Proceedings of a Workshop. Washington DC: National Academies Press;2020. [PubMed] [Google Scholar]
- 27.https://www.gilariver.org/index.php/about/history. Accessed September 10, 2022.
- 28.NIDDK. The Pima Indians: Pathfinders for Health Washington DC: National Institutes of Health.;1996. [Google Scholar]
- 29.Ordinance GR - 05–09 Title 17 Chapter 9 of the Gila RIver Indian Community Code. In: Community. TGRI, ed. [Google Scholar]
- 30.Radin J. “Digital Natives”: How Medical and Indigenous Histories Matter for Big Data. Osiris. 2017;32(1):43–64. [Google Scholar]
- 31.Chang V, Bailey J, Xu QA, Sun Z. Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Comput Appl. 2022:1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.García-Ordás MT, Benavides C, Benítez-Andrades JA, Alaiz-Moretón H, García-Rodríguez I. Diabetes detection using deep learning techniques with oversampling and feature augmentation. Comput Methods Programs Biomed. 2021;202:105968. [DOI] [PubMed] [Google Scholar]
- 33.Kumar V, Lalotra GS, Sasikala P, et al. Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques. Healthcare (Basel). 2022;10(7). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sterling RL. Genetic research among the Havasupai--a cautionary tale. Virtual Mentor. 2011;13(2):113–117. [DOI] [PubMed] [Google Scholar]
- 35.Mello MM, Wolf LE. The Havasupai Indian tribe case--lessons for research involving stored biologic samples. N Engl J Med. 2010;363(3):204–207. [DOI] [PubMed] [Google Scholar]
- 36.After Havasupai litigation, Native Americans wary of genetic research. Am J Med Genet A. 2010;152a(7):fmix. [DOI] [PubMed] [Google Scholar]
- 37.Garrison NA. Genomic Justice for Native Americans: Impact of the Havasupai Case on Genetic Research. Sci Technol Human Values. 2013;38(2):201–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Federal Policy for the Protection of Human Subjects (‘Common Rule’). https://www.hhs.gov/ohrp/regulations-and-policy/regulations/common-rule/index.html. Accessed November 1, 2022, 2022.
- 39.United Nations Declaration on the Rights of Indigenous People. https://www.un.org/development/desa/indigenous-peoples/declaration-on-the-rights-of-indigenous-peoples.html Accessed February 17, 2023, 2023.
- 40.Carroll SR, Rodriguez-Lonebear D, Martinez A. Indigenous Data Governance: Strategies from United States Native Nations. Data Sci J. 2019;18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.National Academies of Sciences E, and Medicine.. Open Science by Design: Realizing a Vision for 21st Century Research.. Washington DC: The National Academies Press;2018. [PubMed] [Google Scholar]
- 42.GA4GH. Framework for Responsible Sharing of Genomic and Health-Related Data. 2019. [DOI] [PMC free article] [PubMed]
- 43.GA4GH. Framework for Involving and Engaging Participants, Patients, and Publics in Genomics Research and Health Implementation. 2021. [DOI] [PMC free article] [PubMed]
- 44.Haring RC, Blanchard JW, Korchmaros JD, et al. Empowering Equitable Data Use Partnerships and Indigenous Data Sovereignties Amid Pandemic Genomics. Front Public Health. 2021;9:742467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Carroll SGI, Figueroa-Rodriguez O, Holbrook J, Lovett R, Materechera S, Parsons M, Raseroka K, Rodriguez-Lonebear D, Rowe R, Sara R, Walker J, Anderson J, Hudson M. The CARE Principles for Indigenous Data Governance. Data Science Journal. 2020;19:1–12. [Google Scholar]
- 46.Carroll SR, Garba I, Plevel R, et al. Using Indigenous Standards to Implement the CARE Principles: Setting Expectations through Tribal Research Codes. Front Genet. 2022;13:823309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Carroll SR, Herczog E, Hudson M, Russell K, Stall S. Operationalizing the CARE and FAIR Principles for Indigenous data futures. Sci Data. 2021;8(1):108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Goldman LE, Chu PW, Tran H, Romano MJ, Stafford RS. Federally qualified health centers and private practice performance on ambulatory care measures. Am J Prev Med. 2012;43(2):142–149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Laiteerapong N, Kirby J, Gao Y, et al. Health care utilization and receipt of preventive care for patients seen at federally funded health centers compared to other sites of primary care. Health Serv Res. 2014;49(5):1498–1518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.National Association of Community Health Centers. 2020; https://www.nachc.org/wp-content/uploads/2020/01/Chartbook-2020-Final.pdf. Accessed April 22, 2020.
- 51.Sjoding MW, Ansari S, Valley TS. Origins of Racial and Ethnic Bias in Pulmonary Technologies. Annu Rev Med. 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28(1):31–38. [DOI] [PubMed] [Google Scholar]
- 53.NACHC. The Protocol for Responding to and Assessing Patient Assets, Risks, and Experiences (PRAPARE). 2013.
- 54.Charting a Course for an Equity-Centered Data System. Princeton, NJ: Robert Wood Johnson Foundation; 2021. [Google Scholar]
- 55.Foundation C. Modernizing Health Data Analytics and Forecasting. Forecasting and Modeling Listening Sessions – Final Report with Technical Notes. Augst 27, 2021.
