Skip to main content
Health Research Alliance Author Manuscripts logoLink to Health Research Alliance Author Manuscripts
. Author manuscript; available in PMC: 2018 Jun 28.
Published in final edited form as: J Comp Eff Res. 2017 Aug 14;6(6):537–547. doi: 10.2217/cer-2017-0009

Stakeholders’ Views on Data Sharing in Multi-Center Studies

Kathleen M Mazor 1,2, Allison Richards 1, Mia Gallagher 3, David E Arterburn 4, Marsha A Raebel 5, W Benjamin Nowell 6, Jeffrey R Curtis 7, Andrea R Paolino 5, Sengwee Toh 3
PMCID: PMC6022827  NIHMSID: NIHMS976931  PMID: 28805448

Abstract

Aims

To understand stakeholders’ views on data sharing in multi-center comparative effectiveness research studies and the value of privacy-protecting methods.

Materials & Methods

Semi-structured interviews with five U.S. stakeholder groups.

Results

We completed 11 interviews, involving patients (n=15), researchers (n=10), IRB and regulatory staff (n=3), multi-center research governance experts (n=2), and healthcare system leaders (n=4). Perceptions of the benefits and value of research were the strongest influences towards data sharing; cost and security risks were primary influences against sharing. Privacy-protecting methods that share summary-level data were acknowledged as being appealing, but there were concerns about increased cost and potential loss of research validity.

Conclusion

Stakeholders were open to data sharing in multi-center studies that offer value and minimize security risks.

Keywords: comparative effectiveness research, data sharing, distributed research networks, electronic databases, multi-center studies, PCORnet, privacy-protecting methods

INTRODUCTION

Multi-center research networks support a wide range of patient-centered outcomes research, comparative effectiveness and safety research, and public health surveillance activities [1, 2]. They allow stakeholders to generate timely and actionable information, study treatment effect heterogeneity in large and diverse populations, and produce generalizable results. In the past, it has often been necessary to share highly granular and potentially identifiable patient-level information across healthcare systems to perform the desired statistical analysis. Even when organizations are willing to collaborate and share information, they must address issues surrounding patient privacy and confidentiality, data security, data control, and proprietary interest to meet federal, state, and institutional requirements. Meeting these requirements can result in real or perceived loss of efficiency associated with extensive, time-consuming negotiations and the administrative paperwork burden (e.g., Institutional Review Board [IRB] approvals, data use agreements).

The advent of several new analytic and data-sharing methods offers a more efficient way of tackling these requirements [39]. For certain analyses, these methods require only summary-level data, such as propensity scores or intermediate statistics from regression models, to produce results identical or highly comparable to those from pooled patient-level data analysis [39]. These newer methods are considered more “privacy-protecting” as they do not require exchange of potentially identifiable information. They have the potential to improve the efficiency of research through more streamlined security and privacy protection requirements, and could enhance stakeholders’ willingness and ability to collaborate in multi-center studies.

Existing research suggests that patients and the public are concerned about the privacy of their electronic health information, but also value research that has the potential to improve care [1012]. At the same time, most patients and members of the public are not familiar with how their data may be shared, and how their privacy is currently protected [10, 13, 14]. The new privacy-protecting methods are especially unfamiliar to the public, and relatively unfamiliar to most stakeholders involved in research. These methods may also lack the capability to address some stakeholders’ needs and preferences. Regardless of how robust or secure, methods are of limited value if not known to, understood by, and proven to be useful to stakeholders. The goal of this qualitative study was to explore and describe various stakeholders’ views on sharing of electronic health information in multi-center comparative effectiveness research studies and on privacy-protecting methods in particular.

MATERIALS & METHODS

Stakeholder groups interviewed

For the purposes of this study, we defined our stakeholders as individuals contributing data to multi-center studies, individuals responsible for stewardship of patient data and the requirements associated with engaging their institutions in data sharing, individuals involved in overseeing and conducting multi-center studies, or individuals involved in using the results of multi-center studies [15]. We identified and invited a purposive sample of stakeholders to participate in the study, including: (1) patients, (2) healthcare system leaders, (3) experts in the governance of multi-center studies, (4) researchers, and (5) experts who review or oversee compliance, confidentiality, and regulatory requirements of research studies.

We recruited patients from two existing groups: 1) a bariatric surgery patient advisory panel previously convened to advise on a research application, and 2) patients who participated in the Arthritis Partnership with Comparative Effectiveness Research (known as ArthritisPower), a Patient-Powered Research Network within the National Patient-Centered Clinical Research Network (PCORnet) [16]. We chose these two existing groups because this study was conducted in the context of a larger project that involves patients who have undergone or are considering bariatric procedures and patients with autoimmune diseases. We identified healthcare systems leaders, experts in the governance of multi-center studies, and experts in research compliance, confidentiality, and regulatory requirements from three delivery systems (Group Health Research Institute [now Kaiser Permanente Washington Health Research Institute], Kaiser Permanente Colorado, and Kaiser Permanente Northern California). We enrolled researchers from the attendees of the Patient-Centered Outcomes Research Institute (PCORI) Annual Meeting in 2015. In the following text, we refer to patient stakeholders as “patients” and to all other participants as “organizational stakeholders”.

Data-sharing and analytic methods of interest

We were interested in stakeholders’ views on various data-sharing and analytic methods used in multi-center studies, including pooled patient-level data analysis, patient-level or summary-level data analysis that leverage confounder summary scores (e.g., propensity scores), riskset-based analysis, and meta-analysis of site-specific effect estimates [36]. Each method requires sharing specific information across sites and offers various degrees of analytic flexibility. See Appendix 1 for examples of information typically shared by a participating site in a multi-center study using these analytic methods. Detailed description of the strengths and limitations of each method is available in other published articles [36].

Interview process and content

Prior to each interview, we sent stakeholders a fact sheet that described the purpose of the study, potential risks of the interview (which were minimal), and their expected level of participation (Appendix 2). We conducted the interviews in person or via telephone, as a group or individually, based on the preference and availability of the stakeholders. One author (ST) conducted all interviews. At least one other member of the research team was also present for all interviews. Each interview began with a review and clarification of the fact sheet. The interviewer then described various data-sharing and analytic methods in multi-center studies using educational materials (see Appendix 3 for a version used for the interviews with the healthcare system leaders) tailored to the background of the interviewees. The presentations and interviews focused on data typically captured in electronic health records and administrative databases, rather than biospecimens or genetic data. The interviewer then asked the interviewees a series of questions based on the domains developed by the study team (Table 1). The specific interview questions varied depending on the interviewee’s role and familiarity with data sharing, and evolved over the course of the interview (see Appendix 4 for an interview guide used for the healthcare system leaders). We recorded all interviews with permission from the interviewees and professionally transcribed them for analysis. We did not collect any identifiable information about the interviewees during the interview.

Table 1.

Interview domains

Familiarity and experience with multi-center research and data sharing
Attitudes towards multi-center research and data sharing
Perspectives on privacy and data security
Perspectives on sharing aggregate versus individual-level data
Recommendations for improving processes around data sharing
Reactions to privacy-protecting analytic and data-sharing approaches

Note: Specific interview questions varied across interviews depending on stakeholders’ roles and responses.

Analysis

We used an integrated approach to the qualitative data analysis as described by Bradley et al [17]. The interview domains provided an initial organizing framework, consistent with a directed content analysis approach [18]. However, we were attentive to unanticipated content as we reviewed the transcripts and applied the evolving coding scheme, integrating new codes and concepts as they emerged inductively, consistent with conventional content analysis [18]. One investigator (KMM) created an initial coding framework after observing four interviews. Two other investigators (ST, DA), who had participated in the interviews provided feedback on the framework, and suggested additional themes or subthemes. The first investigator (KMM) elaborated the framework through ongoing review of the transcripts as additional interviews were completed. Four team members (ST, DA, MR, and AR) each reviewed at least one transcript, with the coding framework at hand. These second readers checked for the completeness of the framework, and suggested new codes or modifications based on their review. The full qualitative team (KMM, ST, DA, MR, and AR) reviewed and reached consensus that the final coding framework captured all relevant themes and subthemes expressed in the interviews (Appendix 5). One team member (AR) coded all transcripts; a second team member (KMM) reviewed the coded transcripts to confirm accuracy, and to resolve any questions that emerged during the final coding. The team entered the transcripts and codes into the Statistical Package for the Social Sciences (version 22) in order to facilitate data management, manipulation, and reporting.

RESULTS

We interviewed 34 stakeholders between June 2015 and February 2016 (Table 2). The average interview duration was approximately 61 minutes (range: 36 to 109 minutes). The analysis identified three major themes which emerged inductively from the qualitative analysis: 1) perceived benefits and value of research, 2) cost, and 3) perceived risks. Figure 1 provides a conceptual model of how these major themes relate to stakeholders’ willingness to share data in multi-center studies, which was a central focus on this study. Each of these major themes (perceived benefits and value, cost, and perceived risks) was influenced by the granularity of the information to be shared, as well as by other factors (e.g., perceptions of risk were also influenced by past experiences). We noted varying levels of stakeholder familiarity with privacy-protecting analytic and data-sharing methods, as well as differences in views on the usefulness of these methods; these findings are presented last.

Table 2.

Stakeholder groups interviewed

Stakeholder group No. of participants Interview type Interview mode
Patients
 Arthritis patient panel 10 Group In person
 Bariatric patient panel 1 4 Group In person
 Bariatric patient panel 2 1 Individual Telephone
Healthcare systems leaders
 Vice president for governmental external relations 1 Individual Telephone
 Executive medical director 1 Individual Telephone
 Medical director for quality 1 Individual In person
 Consultant for research compliance and ethics 1 Individual In person
Multi-center research governance experts
 Multi-center research governance expert 1 1 Individual In person
 Multi-center research governance expert 2 1 Individual In person
Researchers 10 Group In person
Compliance, confidentiality, and regulatory experts 3 Group Telephone

Each row of this table represents a separate interview session, either group or individual.

Figure 1.

Figure 1

Major themes identified from the stakeholder interviews

A. Factors that influenced willingness to share data in multi-center studies

1. Perceived benefits and value of the research

Stakeholders’ perceptions of the purpose, benefit, and value of the research were the strongest influences towards data sharing. Both patients and organizational stakeholders referred to the need for research that would answer questions that they perceived to be important, relevant, and likely to improve care or outcomes for patients. They indicated they would be more likely to support data sharing in pursuit of those goals. As one patient said: “If it’s improving the general knowledge in service of people like me, that’s a good thing.” In contrast, patients were unwilling to share data if they perceived the request was motivated by financial gain or profit. Patients considered it both possible and highly objectionable that an entity might profit by selling their data.

Organizational stakeholders valued data sharing as a means of improving patient care, and of advancing understanding of treatment risks and side effects. They referred to the fact that multi-center data sharing necessarily results in larger data sets, and thus enhances the ability to study rare diseases and rare outcomes (i.e., increased statistical power). Organizational stakeholders also referred to improving generalizability of study findings. As one organizational stakeholder stated, “It seems like more data is better […] more generalizable, more scientific.” Another noted that data sharing allows healthcare systems to “provide richer data to the world.”

Patients’ comments indicated a desire for their data to be helpful, and to lead to valid and actionable findings with the potential to improve care for others. Patients referred to the need for “good science”, and recognized that not all studies achieve this. As one patient said, “I suppose it goes back to the risk/reward, [……] we’re getting good science out of these studies. And if we’re not, I think that’s a bigger problem than the privacy issue.”

2. Costs

Cost was a factor identified as influencing organizational stakeholders away from data sharing. Organizational stakeholders’ comments implied that financial consequences and costs of decisions were important in their decision making, including their decision making related to their organization’s participation in research and data sharing. These stakeholders were cognizant of the costs associated with data sharing, and considered these when making decisions about data sharing. They noted that data sharing requires resources, most notably programmer or analyst time and expertise, which are often limited. As one organizational stakeholder noted “everything’s an opportunity cost.” None of the stakeholders commented on how the costs of data sharing using privacy-protecting methods might be covered, though one organizational stakeholder commented that building on existing research networks, where the foundational work, such as the creation of shared data models, “lowers the burden” of data sharing. This stakeholder went on to say that in the short- to near-term additional the costs associated with developing and implementing privacy-protecting methods would be “an investment in methods development”, but also noted that if a project did not fully cover the costs of participation, then “we can’t do it”.

While patients did not refer to the cost of data sharing per se, some mentioned compensation, believing that they should be compensated for the use of their data, with compensation being broadly defined to include financial compensation, expressions of appreciation and recognition, and sharing of results. Patients also expressed concern that their data might be used for commercial purposes.

3. Perceived risks

Perceived risk was also identified as influencing stakeholders away from data sharing. The most prominent concern identified by organizational and patient stakeholders was loss of control of the data, with the associated risk of unauthorized use or disclosure. Interviewees expressed concerns that sensitive health information, including information about patients’ diagnoses and treatments, might be divulged to those who should not have access, and ultimately result in harm to patients. While patients were concerned about loss of confidentiality and unauthorized release of their information, few were explicit in identifying the downstream consequences of disclosure they were most worried about. One patient was somewhat specific, expressing a concern about the possible impact on employment, saying, “Twenty years ago, and you have HIV, you’re fired, […] Today, not as much, but, like, I think that’s a factor.” Another patient referred to “my insurance company or somebody’s going to use that against me”, while another said simply “it’s a stigma [...] it’s nobody’s business.”

Organizational stakeholders also alluded to the risk of data sharing resulting in damage to an organization’s reputation, or loss of competitive advantage. One organizational stakeholder referred to using the litmus test “if this were released and it ended up on the front page of the [newspaper name], what would that do? To our patients, to our reputation, et cetera.” Organizational stakeholders appeared concerned about the possibility that disclosed data could suggest that a given provider, clinic, or organization might be portrayed as a poor performer, referring to “issues around quality outcomes, competition,” in this context. One organizational stakeholder referred to concerns about “a data set and that gets in the wrong hands and you suddenly discover that, you know, this one clinic is horrible.” Another organizational stakeholder referred to the potentially competing interests of researchers within an organization, noting “you also have the researcher who might be trying to do, you know, kind of establish themselves in a particular topic area, and may feel some level of protectiveness over the data”. Overall, organizational stakeholders were acutely aware that harm could result from a data breach or loss of patient confidentiality secondary to data sharing, though none reported direct experience with such events. One organizational stakeholder referred to the widely publicized data breach at the Veteran’s Health Administration, saying “a data breach in V.A. research, as you may remember, completely shut down the V.A. research enterprise for a couple of years…It was horrible.”

It is noteworthy, however, that some patients and organizational stakeholders were not concerned about data sharing, and made explicit their belief that there was little risk of harm. One patient asked directly whether unauthorized disclosure was a problem with research data, saying “I guess I would want to know how rampant a problem it is,” later noting “The risk is much smaller than say, just me buying something with my credit card.”

Several factors influenced organizational and patient stakeholders’ perception of risk, as described in Figure 1:

(i) Safeguards

Organizational stakeholders identified a number of safeguards and strategies used to minimize risk of data breaches and to maximize data security. Some organizational stakeholders indicated that such safeguards are currently in place; others indicated that they would require that such safeguards be in place prior to data sharing. Approaches referenced included technological approaches (e.g., use of encryption, firewalls), policies and contractual practices (e.g., data use agreements), and oversight for ensuring compliance with agreed upon practices.

Some organizational stakeholders noted their organizations required that an internal researcher be involved in all studies involving data sharing to reduce the risk of inappropriate use. Involvement of an internal researcher was also sometimes necessary to ensure that the nuances of the data were taken into account in analyses and reporting. Some organizational stakeholders were apparently acutely aware of the complexities of operational data, and the potential for naïve users to make incorrect assumptions about the data which could in turn lead to erroneous and invalid results.

Organizational stakeholders also referred to restricting data access (again referring to the current implementation of such practices), and the need to obtain assurances about limits on access whenever data were shared. Patients also brought up the importance of restricting data access, oversight of such restrictions, and voiced specific questions about data security, for instance, wanting details on how the data would be transferred. Some patients expressed uncertainty about current practices; as one patient said, “I don’t know who has access to my information.”

(ii) Prior Experience

Stakeholders’ prior experience with data sharing influenced their views on the potential risks. Several organizational stakeholders referred to sharing data for research without problems or concerns. Successful experiences appeared to reduce the perception of risk, at least for data sharing in similar contexts. No interviewees reported direct personal or organizational experience of negative consequences of data sharing, though one organizational stakeholder referred to a “near miss”, i.e., an event where identifiable data was almost shared, but was detected and prevented. Organizational stakeholders also noted that if a data breach were to occur, it would be likely to have a major impact on the organization’s willingness to share data in the future.

Two patients mentioned personal experience working with data (one in a work setting, and one in an educational setting), and indicated that this experience had increased their comfort with data sharing. One patient noted “we would get data sets like this, and I mean, there was absolutely no way you could tell, you know, even what region the person was from [...] I can say as someone who has seen how it’s presented, you know, I feel safe.” Another patient also referred to being more comfortable when “everything’s just a number.” An organizational stakeholder also raised this issue, noting, “I don’t think the patients have a clear sense of when we go into a data warehouse and extract data, what that’s like, that they’re a string, with a random ID.”

(iii) Trust and Relationships

Both patients and organizational stakeholders referred to the need to trust the researchers or organization requesting data, both with respect to how the data would be used and in the users’ ability to ensure the data security. The degree of trust appeared to be influenced by familiarity, and whether there was an existing relationship with the organization or individuals involved. As one patient stated “…with [organizations], you know, there’s years of trust there, and so forth. So that comes down to the people, knowing the people that are behind the scenes, working with that information.” Organizational stakeholders were less willing to share with unfamiliar requestors. As one organizational stakeholder stated: “if we were approached by some other, new group we’ve never heard of, that our delivery systems or health insurers or whatever that we don’t know and they said, ‘Trust us,’ we would (laughs) have some trouble with that.”

B. Type and granularity of data shared

The type of data to be shared and the degree of aggregation also influenced stakeholders’ views on the value, costs, and risks of data sharing and their willingness to share. In all multi-center studies, the research question drives the analytic approach, which in turn dictates the type and granularity of information to be shared. Organizational stakeholders, especially those with oversight or regulatory responsibility, focused on whether the requested data elements were relevant to the research questions, and were unwilling to approve sharing of data elements that were not relevant. Organizational stakeholders were also reluctant to approve sharing of sensitive information such as HIV status, mental health status, or alcohol use, and referred to requests for medical record numbers as “red flags”. Patients were generally unwilling to share names, birth dates, social security numbers and financial information; it was implicit in organizational stakeholders’ comments that these would typically not be shared. Some patients wanted to specify which data elements would be shared, and the conditions under which these could be shared; others indicated they would want to be informed when their data was shared. In general, some research topics and data elements were considered more sensitive than others, and would receive greater scrutiny.

Both patients and organizational stakeholders made statements and asked questions about the relative advantages and disadvantages of summary- versus individual-level data. The risk reduction obtained by sharing summary-level data rather than individual-level data was attractive to some stakeholders. However, a repeated theme across several interviews was whether aggregating data resulted in a loss of information that would reduce the value or validity of the research. As one organizational stakeholder asked, “How much more generalizable knowledge can be obtained through – from the scientific perspective – in analyzing the patient-level data?” A patient asked a similar question, with a slightly different focus, saying, “Does this type of method, where you have less granular information, lead to a less actionable result?”; and later “To me, actionability of research outweighs my privacy anxiety, significantly.” Some questioned whether summary-level data would allow as complete and nuanced exploration of the research question as individual-level data.

Organizational stakeholders expressed concerns about the costs involved, noting that creating summary-level data files may require more technical and programming expertise and additional resources to create. Devoting resources to aggregating data files was seen as having opportunity costs as well, as programmers and analysts were viewed by some as a relatively scarce staff resource within their organization.

Some organizational stakeholders opined that summary-level-based approaches would be appropriate if the goal of the study was to answer a single, well-defined research question, but that these approaches would be less useful if the goal was to gain a nuanced understanding of a phenomenon. As one organizational stakeholder put it, “[…] if you get something that’s surprising, you’d want to know why and that means you have to unpack it […] you probably can’t do that because some of those problems are in the way the propensity score was constructed.”

Some organizational stakeholders indicated that summary-level data approaches would not influence their willingness to share data. As one put it, “If I’m not comfortable giving you the individual stuff, I’m not going to be comfortable giving you the propensity score.” This leader went on to say, “It seems a tradeoff and the question is, what do I gain for that tradeoff and do I think that that was already at risk? If I saw the data as at risk, I don’t know that I’d be wanting to participate.” However, individual-level data was not preferred by all organizational stakeholders: “Yes, you have more ability to do analysis on patient-level data, but it comes at a cost, right? Of security and privacy.” Another saw an advantage in planning and decision making needed to assemble data for aggregate approaches, suggesting that specifying the variables to be included and the analysis prior to data sharing would result in a more “honest and transparent” approach.

C. Familiarities with and views on the privacy-protecting analytic and data-sharing methods

Patient stakeholders were unfamiliar with privacy-protecting analytic and data-sharing methods; organizational stakeholders expressed limited understanding. Most interviewees had never heard of one or more of these newer methods, but some researchers had used some of the methods (e.g., propensity score-based methods) in their studies.

Stakeholders’ reactions to the privacy-protecting analytic and data-sharing methods, as we described them during the interviews, were mixed. Some interviewees did not perceive a need for these methods, and others did not view these methods as providing significantly greater privacy protection. Overall, organizational stakeholders considered current safeguards sufficient. However, as one interviewee noted, if someone “made a big mistake” those views might change, resulting in a greater need for privacy-protecting methods. Some were uncertain of the relative advantages of the newer privacy-protecting methods (i.e., the approaches which were the ultimate focus of this investigation).

Other organizational stakeholders felt that use of privacy-protecting methods were clearly preferable to sharing patient-level data. As one organizational stakeholder said, “I believe that the cultural resistances to patient-level data sharing are so deeply embedded in organizations that the best approach is privacy-protecting methods […] I think privacy-protecting methods allow us to patiently but persistently figure out better approaches to multi-site data.”

Some interviewees suggested that privacy-protecting methods would be more acceptable to specific stakeholder groups. For instance, researchers predicted that IRBs would find these methods more acceptable. This was confirmed by a comment from an organizational stakeholder with IRB experience who said, “From an IRB perspective it’s great. It’s definitely better, there’s no question.” Another organizational stakeholder predicted “Our patients are going to be viscerally more comfortable with it.”

Patients’ comments were more equivocal. One patient, apparently unconvinced of the need for or value of privacy-protecting methods, commented “It’s a lot of trouble simply for me to feel a little more secure. And for my vote, it’s insignificantly more secure.” Another patient appeared not to perceive a need for privacy-protecting methods personally, but thought other patients might: “You know, there’s information I’m willing to give and information I’m not willing, you know, to – as long as you let me know. I don’t care. But I can see that there’s going to be a lot of people who aren’t so open, and I think this method would probably make them much more comfortable.”

Discussion of ways to increase the acceptance of privacy-protecting methods identified recommendations for providing additional evidence of the value of the approach. Some organizational stakeholders wanted to see examples of the application of these methods, and demonstrations of the equivalence of results obtained when using these methods compared to standard approaches. As one organizational stakeholder put it “since these methods are opaque by design, I think the only way to overcome that is a series of studies that basically have access to both the full data set and the privacy protected methods and to show that across a wide array of questions, data structures, analytic techniques, that the results are identical.” Another stated “we have to have confidence as a reader of the literature, that they [privacy-protecting methods] actually are correct.” A patient made a similar recommendation, saying, “If you can say that you can get the same quality results from the summary, then I’d go with that. But my – I question whether or not that’s true.”

The possibility that proposals using privacy-protecting methods might make it through the institutional review process relatively quickly was noted as an advantage by some organizational stakeholders, and examples of instances where proposals using these methods resulted in more timely IRB approval would help to convince stakeholders of their value. Similarly, recognition of the resources needed to produce the summary-level datasets used in privacy-protecting analyses led to recommendations to find ways to make these methods cheaper, faster, and more efficient.

One organizational leader referred to the “downside” of privacy-protecting methods as “[…] the fact that all reputable researchers like to get the data under their fingernails. You like to get dirty with the data. And when you can’t do that, then, then you get apprehensive and you should. That’s an instinct that was trained into all of us in graduate school. And so not being able to see the raw data makes us viscerally uncomfortable.”

Patients’ questions and comments also highlighted the need to inform and educate patients about current practices and protections. As one patient stated “I think part of it comes down to, it’s just patients getting enough education about the process, and the outcomes that we’re looking for, to feel comfortable sharing that information, and to realize that you know what? Guess what? Maybe we, as a generation, need to go out on a limb a little bit here, but it’s that proper education of how this is going to be used, and proper thanks.”

DISCUSSION

Multi-center research studies that leverage various existing data resources have the potential to generate timely, actionable, and generalizable results. Our findings from these interviews elucidate the factors that influence stakeholders’ views on data sharing in multi-center research. Consistent with prior studies conducted in the United States, the United Kingdom, Canada, and elsewhere [1012, 19, 20], our findings suggest that while stakeholders generally recognize the value of research and are motivated to contribute to better patient care and outcomes, many have reservations about data sharing for research. New analytic methods may reduce concerns about privacy and anonymity, but our findings suggest that other factors influence stakeholders’ views and must be considered.

Our findings extend what is known by providing insights into organizational stakeholders’ perspectives that were generally consistent with patients’ views, perhaps because most have responsibility for protecting patient confidentiality and data security. While both patients and organizational stakeholders voiced questions and concerns, most were open to data sharing as long as the research was addressing questions that were important to patient care.

A particular focus of these interviews was to explore stakeholders’ reactions to the newer privacy-protecting methods analytic and data-sharing methods. Our experience in these interviews highlights the fact that these methods were not well-understood by the stakeholders. The methods were also difficult to explain, especially to less technical audiences. As we did explain them, reactions were mixed. While stakeholders acknowledged that privacy-protecting methods enhanced privacy and reduced the risk of re-identification of patients, these benefits were weighed against the cost of preparing the data sets, and the perception that such approaches might limit the value of the research by reducing generalizability, validity, and the ability to explore nuances in the data. There are several ways to make multi-center studies more efficient, e.g., by standardizing the databases in advance so that the analytic code can be developed by the study team and executed with minimal modifications at other participating sites [1, 2123]. Recent simulation and empirical studies have also shown that these methods produce results statistically equivalent to the results from pooled patient-level data analysis for certain study settings [5, 6, 8, 9]. The feedback from the stakeholders in this study highlights the need for better education and more research in these methods.

Trust emerged as an important influence in our interviews, for both patients and organizational stakeholders. The importance of trust to patients has been reported previously [11, 20, 2426]. Our finding that organizational stakeholders also consider trust and relationships when deciding about data sharing in the context of multi-center studies is not surprising. The stakeholder interviews provided insights into ways to build and maintain trust, including familiarity with the data requestor and proper safeguarding of the data.

The patients participating in this study were already engaged with the research process in some way and thus potentially more open to data sharing than other patients, but most did not convey a solid understanding of existing practices and safeguards around the use of their personal health information. All of the patient stakeholders in this study were familiar with research wherein individuals choose whether or not to participate, provide informed consent, and know generally what information they are providing to the researchers. However, many patients were not familiar with studies where electronic health data might be de-identified and shared without documentation of patients’ permission. Patients with previous exposure to data analyses or reporting in the context of work or school appeared much less concerned about the risks associated with data sharing. Typical patients without exposure to research or data analysis processes likely have an even more limited understanding of how data may be used and what data sharing entails. This may influence their willingness to share their information for research. Our findings are consistent with prior studies that have documented patients are poorly informed as to current practices, safeguards, and implications of data sharing [1014, 20, 27, 28].

Further, our findings highlight the need to improve patients’ awareness and understanding of the risks and benefits of research in general. Future studies are needed to identify the best methods of educating the public about existing safeguards and data sharing practices, as well as the need for, and potential value of, comparative effectiveness research using real-world data. For organizational stakeholders, who are likely to weigh the potential value of proposed research against both the costs and the potential risks, additional research to determine the actual costs of data sharing using different methods, as well as further evidence as to the comparability of the findings obtained, may help these stakeholders as they consider the tradeoffs associated with each method.

A major strength of the study is the inclusion of a group of stakeholders with diverse backgrounds, who may be involved in multi-center studies. Their participation in these interviews offered a more comprehensive view on the complex issues around data sharing in multi-center studies. However, our findings should be interpreted in the context of the following limitations. This was a qualitative study with a relatively selected group of stakeholders; while participants brought different perspectives, we do not know the extent to which their views are representative. We conducted both group and individual interviews, which may have influenced our findings. The study was not designed to provide generalizable findings. Patients were selected because of their engagement with research; their views may not be representative of naïve patients. On the other hand, their feedback may be relevant to PCORnet, which includes a network of patients who are actively engaged in research activities. Finally, our focus was on sharing information currently available in electronic healthcare databases, such as diagnoses, pharmaceutical or surgical treatments, healthcare encounters, and laboratory test results. We did not explore issues related to sharing genetic or genomic data, and so cannot comment on whether stakeholders’ views on that topic would be similar or different.

CONCLUSION

In this study, we found that stakeholders are open to data sharing in multi-center studies if the research offers benefits and value to patient care, minimizes data security risks, and can be done at reasonable cost. The gains in privacy protection associated with the use of privacy-protecting analytic and data-sharing methods in multi-center studies were attractive to some stakeholders, but others were concerned about increased cost and potential loss of research validity when using these methods. Most stakeholders were not familiar with these newer methods and their validity, highlighting the need for better education and more research into these methods.

Supplementary Material

Appendices

Executive summary.

  • Data sharing is a fundamental step in multi-center studies, allowing stakeholders to generate timely and actionable information, study treatment effect heterogeneity, and produce generalizable results. However, data sharing entails costs and risks.

  • Newly developed privacy-protecting analytic and data-sharing methods offer an approach to sharing data and conducting multi-center research that eliminates the need to share identifiable patient-level information.

  • We conducted semi-structured group and individual interviews with diverse stakeholders to gather a variety of perspectives on data sharing. Interviews were audio-recorded and professionally transcribed. Using content coding followed by thematic coding, we sought to identify factors affecting stakeholders’ willingness to share data.

  • We completed a total of 11 stakeholder interviews, involving patients (n=15), researchers (n=10), IRB and regulatory staff (n=3), multi-center research governance experts (n=2), and healthcare system leaders (n=4).

  • Stakeholders’ perceptions of the benefits and value of the research was the strongest influence towards data sharing; perceived value was related to the relevance of the scientific question and the methodological rigor.

  • Influences against data sharing were primarily cost and data security risks; the latter could be mitigated by various safeguards (e.g., encryption, data use agreements, and oversight), successful data sharing experience, established relationships, and trust.

  • The risk reduction obtained by sharing aggregate data rather than individual-level data was acknowledged as being potentially more acceptable to some stakeholders, but some stakeholders expressed concerns about the increased cost and potential loss of research validity.

  • Our findings highlight the need for better education and more methodological research in privacy-protecting analytic and data-sharing methods.

References

  • *1.Curtis LH, Brown J, Platt R. Four health data networks illustrate the potential for a shared national multipurpose big-data network. Health Aff (Millwood) 2014;33(7):1178–1186. doi: 10.1377/hlthaff.2014.0121. An overview of recent efforts to use multiple databases for rapid evidence generation in the United States. [DOI] [PubMed] [Google Scholar]
  • 2.Toh S, Platt R, Steiner JF, Brown JS. Comparative-effectiveness research in distributed health data networks. Clin Pharmacol Ther. 2011;90(6):883–887. doi: 10.1038/clpt.2011.236. [DOI] [PubMed] [Google Scholar]
  • 3.Rassen JA, Avorn J, Schneeweiss S. Multivariate-adjusted pharmacoepidemiologic analyses of confidential information pooled from multiple health care utilization databases. Pharmacoepidemiol Drug Saf. 2010;19(8):848–857. doi: 10.1002/pds.1867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • **4.Toh S, Gagne JJ, Rassen JA, Fireman BH, Kulldorff M, Brown JS. Confounding adjustment in comparative effectiveness research conducted within distributed research networks. Med Care. 2013;51(8 Suppl 3):S4–10. doi: 10.1097/MLR.0b013e31829b1bb1. A useful overview of various data-sharing and analytic methods available in multi-center studies, focusing on patient privacy and analytic flexibility. [DOI] [PubMed] [Google Scholar]
  • 5.Toh S, Reichman ME, Houstoun M, et al. Multivariable confounding adjustment in distributed data networks without sharing of patient-level data. Pharmacoepidemiol Drug Saf. 2013;22(11):1171–1177. doi: 10.1002/pds.3483. [DOI] [PubMed] [Google Scholar]
  • **6.Toh S, Shetterly S, Powers JD, Arterburn D. Privacy-preserving analytic methods for multisite comparative effectiveness and patient-centered outcomes research. Med Care. 2014;52(7):664–668. doi: 10.1097/MLR.0000000000000147. A real-world data analysis that compared the statistical performance of several data-sharing and analytic methods in a multi-database study. The study showed that some privacy-protecting methods produce statistically equivalent or highly comparable results to the results from pooled patient-level data analysis (benchmark) [DOI] [PubMed] [Google Scholar]
  • 7.Fireman B, Lee J, Lewis N, Bembom O, Van Der Laan M, Baxter R. Influenza vaccination and mortality: differentiating vaccine effects from bias. Am J Epidemiol. 2009;170(5):650–656. doi: 10.1093/aje/kwp173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Karr AF, Lin X, Sanil AP, Reiter JP. Secure regression on distributed databases. J Comput Graph Stat. 2005;14(2):263–279. [Google Scholar]
  • 9.Wu Y, Jiang X, Kim J, Ohno-Machado L. Grid Binary LOgistic REgression (GLORE): building shared models without sharing data. J Am Med Inform Assoc. 2012;19(5):758–764. doi: 10.1136/amiajnl-2012-000862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • *10.Hill EM, Turner EL, Martin RM, Donovan JL. “Let’s get the best quality research we can”: public awareness and acceptance of consent to use existing data in health research: a systematic review and qualitative study. BMC Med Res Methodol. 2013;13:72. doi: 10.1186/1471-2288-13-72. A useful systematic review that summarizes both qualitative and quantitative studies of the public’s views on the use of existing health data in research. Findings from 8 countries are represented. Results are complemented by focus group findings (primary data collection) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • *11.Damschroder LJ, Pritts JL, Neblo MA, Kalarickal RJ, Creswell JW, Hayward RA. Patients, privacy and trust: patients’ willingness to allow researchers to access their medical records. Soc Sci Med. 2007;64(1):223–235. doi: 10.1016/j.socscimed.2006.08.045. Patients from Veterans Affairs facilities in the United States participated in small group deliberations (with access to experts) about sharing their medical records for research. Both qualitative and quantitative methods were used to assess participants’ views. [DOI] [PubMed] [Google Scholar]
  • *12.Willison DJ, Schwartz L, Abelson J, et al. Alternatives to project-specific consent for access to personal information for health research: what is the opinion of the Canadian public? J Am Med Inform Assoc. 2007;14(6):706–712. doi: 10.1197/jamia.M2457. A national telephone survey of the Canadian public’s attitudes towards allowing access to medical information for research. Quantitative data on attitudes are presented conveying the complexity of the public’s views on this topic. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Robling MR, Hood K, Houston H, Pill R, Fay J, Evans HM. Public attitudes towards the use of primary care patient record data in medical research without consent: a qualitative study. J Med Ethics. 2004;30(1):104–109. doi: 10.1136/jme.2003.005157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Whiddett R, Hunter I, Engelbrecht J, Handy J. Patients’ attitudes towards sharing their health information. Int J Med Inform. 2006;75(7):530–541. doi: 10.1016/j.ijmedinf.2005.08.009. [DOI] [PubMed] [Google Scholar]
  • *15.Concannon TW, Meissner P, Grunbaum JA, et al. A new taxonomy for stakeholder engagement in patient-centered outcomes research. J Gen Intern Med. 2012;27(8):985–991. doi: 10.1007/s11606-012-2037-1. A new framework and taxonomy for engaging stakeholders in patient-centered outcomes research. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Daugherty SE, Wahba S, Fleurence R. Patient-powered research networks: building capacity for conducting patient-centered clinical outcomes research. J Am Med Inform Assoc. 2014;21(4):583–586. doi: 10.1136/amiajnl-2014-002758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bradley EH, Curry LA, Devers KJ. Qualitative data analysis for health services research: developing taxonomy, themes, and theory. Health Serv Res. 2007;42(4):1758–1772. doi: 10.1111/j.1475-6773.2006.00684.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hsieh HF, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. 2005;15(9):1277–1288. doi: 10.1177/1049732305276687. [DOI] [PubMed] [Google Scholar]
  • 19.Luchenski SA, Reed JE, Marston C, Papoutsi C, Majeed A, Bell D. Patient and public views on electronic health records and their uses in the United kingdom: cross-sectional survey. J Med Internet Res. 2013;15(8):e160. doi: 10.2196/jmir.2701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Stevenson F, Lloyd N, Harrington L, Wallace P. Use of electronic patient records for research: views of patients and staff in general practice. Fam Pract. 2013;30(2):227–232. doi: 10.1093/fampra/cms069. [DOI] [PubMed] [Google Scholar]
  • 21.Brown JS, Holmes JH, Shah K, Hall K, Lazarus R, Platt R. Distributed health data networks: a practical and preferred approach to multi-institutional evaluations of comparative effectiveness, safety, and quality of care. Med Care. 2010;48(6 Suppl):S45–51. doi: 10.1097/MLR.0b013e3181d9919f. [DOI] [PubMed] [Google Scholar]
  • 22.Toh S, Platt R. Is size the next big thing in epidemiology? Epidemiology. 2013;24(3):349–351. doi: 10.1097/EDE.0b013e31828ac65e. [DOI] [PubMed] [Google Scholar]
  • 23.Ross TR, Ng D, Brown JS, et al. The HMO Research Network Virtual Data Warehouse: A Public Data Model to Support Collaboration. EGEMS (Wash DC) 2014;2(1):1049. doi: 10.13063/2327-9214.1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kass NE, Natowicz MR, Hull SC, et al. The use of medical records in research: what do patients want? J Law Med Ethics. 2003;31(3):429–433. doi: 10.1111/j.1748-720x.2003.tb00105.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Weitzman ER, Kaci L, Mandl KD. Sharing medical data for health research: the early personal health record experience. J Med Internet Res. 2010;12(2):e14. doi: 10.2196/jmir.1356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Paolino AR, Mcglynn EA, Lieu T, et al. Building a Governance Strategy for CER: The Patient Outcomes Research to Advance Learning (PORTAL) Network Experience. EGEMS (Wash DC) 2016;4(2):1216. doi: 10.13063/2327-9214.1216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bell EA, Ohno-Machado L, Grando MA. Sharing my health data: a survey of data sharing preferences of healthy individuals. AMIA Annu Symp Proc. 2014;2014:1699–1708. [PMC free article] [PubMed] [Google Scholar]
  • 28.Stone MA, Redsell SA, Ling JT, Hay AD. Sharing patient data: competing demands of privacy, trust and research in primary care. Br J Gen Pract. 2005;55(519):783–789. [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendices

Articles from Journal of comparative effectiveness research are provided here courtesy of Health Research Alliance manuscript submission

RESOURCES