NCI Think Tank Concerning the Identifiability of Biospecimens and “-Omic” Data

Carol J Weil; Leah E Mechanic; Tiffany Green; Christopher Kinsinger; Nicole C Lockhart; Stefanie A Nelson; Laura L Rodriguez; Laura D Buccini

doi:10.1038/gim.2013.40

. Author manuscript; available in PMC: 2014 Dec 1.

Published in final edited form as: Genet Med. 2013 Apr 11;15(12):997–1003. doi: 10.1038/gim.2013.40

NCI Think Tank Concerning the Identifiability of Biospecimens and “-Omic” Data

Carol J Weil ¹, Leah E Mechanic ², Tiffany Green ², Christopher Kinsinger ³, Nicole C Lockhart ⁴, Stefanie A Nelson ², Laura L Rodriguez ⁵, Laura D Buccini ^2,⁶

PMCID: PMC4097316 NIHMSID: NIHMS611739 PMID: 23579437

Abstract

On June 11 and 12, 2012, the National Cancer Institute (NCI) hosted a think tank concerning the identifiability of biospecimens and “omic” Data in order to explore challenges surrounding this complex and multifaceted topic. The think tank brought together forty-six leaders from several fields, including cancer genomics, bioinformatics, human subject protection, patient advocacy, and commercial genetics. The first day involved presentations regarding the state of the science of re-identification; current and proposed regulatory frameworks for assessing identifiability; developments in law, industry and biotechnology; and the expectations of patients and research participants. The second day was spent by think tank participants in small break-out groups designed to address specific sub-topics under the umbrella issue of identifiability, including considerations for the development of best practices for data sharing and consent, and targeted opportunities for further empirical research. We describe the outcomes of this two day meeting, including two complimentary themes that emerged from moderated discussions following the presentations on Day 1, and ideas presented for further empirical research to discern the preferences and concerns of research participants about data sharing and individual identifiability.

Introduction

Prompted by mounting concerns about the advancing science of re-identification, the National Cancer Institute (NCI) convened a group of experts to explore the scientific, ethical, and human participant dimensions of the identifiability of biospecimens and multiple types of “omic” data. The “omic” research fields are stimulating discovery and progress in medicine by leveraging highly annotated datasets in the generation and evaluation of numerous hypotheses with impact across a wide range of phenotypes. In order to foster the most productive and efficient use of research data to generate and test new hypotheses not necessarily envisioned at the time of data collection, the NCI endorses widespread sharing of data across different areas of research. This support for broad data sharing is exemplified by the National Institutes of Health (NIH) data sharing policy for genome wide association studies (GWAS) and anticipated policy for genomic data sharing which seeks to facilitate broad access by researchers to de-identified data in publicly supported or conducted studies, consistent with research participant consent ^1–2.

Provocative publications, however, have questioned the conventional wisdom that individuals included in datasets of aggregated molecular information are fully anonymous. These studies indicate that DNA variant data from pooled single-nucleotide polymorphism (SNP) disease studies ³, datasets of RNA expression levels in tissue samples ⁴, or other quantitative traits ⁵ can be linked directly to human research participants if a matched sample is available. Moreover, some investigators suggest that human beings can be uniquely identified from just 30 to 80 statistically independent SNPs⁶. Furthermore, a recent study demonstrated the ability to re-identify individuals without the availability of a matching sample. Researchers identified individual research participants by using publicly available whole genome research data to calculate short tandem repeats on the Y chromosome in order to query publicly accessible genealogy databases and infer surnames. The genomic research data was linked to metadata, which enabled the investigators to deduce personal identities⁷.

Evolving recognition of the increasing challenges for de-identification of human tissue and associated data inspired a federal regulatory proposal by the U.S. Department of Health and Human Services (DHHS) to consider categorizing all biomedical research involving biospecimens, including collection, storage and secondary analysis of existing tissue, as research involving identifiable data⁸. DHHS sought public comment on potential advantages and disadvantages of considering biospecimens and associated data inherently identifiable. DHHS also solicited response as to whether specific data security protections, including encryption and periodic retrospective random audits, should extend to research using certain types of genomic data, such as genome-wide SNP analyses or whole genome sequences.

Ultimately, the ability to protect research participants from any unauthorized identification of their biospecimens or “omic” data is an issue of public trust that the research community must address forthrightly and transparently. The NCI must be proactive in contemplating the extent to which evolving technologies might undermine the protection of data that is currently deemed “de-identified” under the law and regulations. We wished to consider whether the burgeoning volume of “omic” data generated in biomedical research, coupled with evolving linkage capabilities posing potential informational risks to research participants and genetically linked populations, warranted a reexamination of informed consent disclosures or data sharing policies. This paper reflects a summary of the discussion from the NCI meeting and is not intended to represent an opinion of the NCI, NHGRI, NIH, or DHHS.

Setting

On June 11–12, 2012, we gathered a group of leaders in the research community with varied relevant backgrounds (Appendix 1, Supplementary information) in order to convene a think tank on the science and policy of re-identifying human biospecimens and “omic” data. Our aim was to attempt to outline considerations for the protection of “omic” data and for the distribution to researchers of stored biospecimens and associated data within appropriate ethical and regulatory parameters. The agenda involved an initial day of presentations by leaders in medical research, bioinformatics, ethical and regulatory policy, consumer genetic testing, and patient advocacy. The presentations discussed current challenges posed by biotechnological advances and their effect on participant privacy and perceptions about data sharing. Moderatorled discussion followed each presentation, enabling think tank participants to share experiences and identify challenges and opportunities for empirical research. On the second day, participants convened in one of four pre-assigned breakout sessions (Appendix 2, Supplementary Information) composed of individuals with varying expertise, for in-depth discussion of specified sub-topics.

Major Findings

State of the Science

The purpose of Day 1 of the think tank was to report on the state of the science from a variety of expert perspectives, and specifically to address current challenges surrounding the identifiability of “omic” data and biospecimens. There were a total of nine state of the science presentations, including two keynotes. These talks provided background in multiple fields of expertise including cancer genomics and proteomics, bioinformatics and statistics, legal and regulatory guidance, and industry applications of gene technologies. A brief description and highlights of the presentations are provided (Table 1).

Table 1.

Highlights of Individual State of the Science Presentations

Laura Lyman Rodriguez, PhD-Keynote Speaker Office of Policy, Communication and Education, NHGRI/NIH	*Is it or Isn't It? Evolving Policy Considerations Regarding Genomic Data and Identifiability*
	• Noted 2006 NHGRI sponsored a meetingthat highlighted various definitions for “identifiable”¹⁹
	• Reviewed past and current data-sharing policies
	• Commented on the impact and availability of recent high-throughput sequencing technologies
	• Emphasized the need for a proper balance between societal benefit and individual harm

Misha Angrist, PhD, MS, MFA-Keynote Speaker Duke Institute for Genome Sciences and Policy	*Can You See the Real Me? Human Patients and Human Research Participants*
	• Provided insights from experience as an IRB member and participant in the personal genome project (PGP)
	• Proposed that issues with identifiability and privacy are directly a result of research community and those overseeing human participant protections losing sight of the intended audience
	• Proposed de-identification should not be used to relieve researchers from their obligation to interact with their research participants
	• Stressed the importance of the return of results for community engagement, transparency, and promotion of participant trust

Bradley Malin, PhD, MS Health Information Privacy Laboratory at Vanderbilt University	*“Omics” and the Changing Face of Identifiability*
	• Risk for re-identification is not unique to “omic” data which was demonstrated over a decade ago using demographic data
	• Need to recognize that even with a link (e.g. matching sample), re-identification, although possible, is not probable
	• Suggested privacy policy discussions should acknowledge that de-identification, while somewhat protective, is not a panacea and should focus on the context of risk, opportunities for harm associated with re-identification, and risk-mitigation strategies

P. Pearl O'Rourke, MD Partners Healthcare	*IRBs: Forced to Deal With “Identifiability” of Everything. A Daunting Mandate*
	• Identifiability of various datasets or specimens lies on a spectrum, but IRBs are forced to make a binary choice–“yes” or “no”–when determining whether a proposed study represents human subjects research
	• Geneticists do not necessarily agree on whether particular types of “omic” data are identifiable
	• Explored the potential impact of consent for all secondary research using biospecimens or genomic data

Leslie Biesecker, MD Genetic Disease Research Branch, NHGRI	*Hypothesis-Testing and Hypothesis-Generating Modes of Research*
	• Hypothesis-testing paradigms have been useful during decades of research using low-throughput technology
	• Proposed a paradigm shift toward hypothesis-generating research, allowing investigators to assemble cohorts, collect molecular data, and use data to identify new phenotypes and generate new hypotheses
	• Such research could facilitate the transition toward personalized medicine and prevention, but it requires continuous interaction with research participants and an engaged, iterative approach to informed consent

John Wilbanks Ewing Marion Kauffman Foundation	*Portable Legal Consent*
	• Data sharing can be a beneficial way for patients to affirm their autonomy and overcome feelings of vulnerability and disempowerment
	• Currently there are no avenues for willing individuals to donate their health information to research studies
	• Proposed the use of Portable Legal Consent- a transparent, digital informed consent process that explains the terms under which investigators can access personal data and the rights granted to researchers who access these data and provides participants a comprehensive explanation of the scope of their consent

Deborah Collyar Patient Advocates in Research	*Making Sense of Genomics While Protecting People*
	• From the patient prospective, emphasized the need for plain language and clear information
	• Consent documents should state clearly the risks associated with donation, address mistrust arising from past medical controversies or atrocities, and allow participants to decide whether they want information back
	• Many patients want the biomedical research community to overcome intellectual property issues and share data

Kenneth Chahine, PhD, JD Ancestry DNA	*The Future of Genomic and Health Data*
	• World is changing from one in which health data are locked away in physicians' offices and research databases to one where individuals share large amounts of information through social networking and mobile phone applications
	• Many of Ancestry participants share their information to foster interaction
	• Individuals' ideas about privacy could change when they see a personal benefit from sharing

Jane Yakowitz Bambauer, JD University of Arizona	*Seeding the Data Commons: Legal Safe Harbors for Research Data*
	• Risk for re-identification is likely overstated and cautioned against implementing protections when the credible downstream risks are still unknown and when the value of information sharing is great
	• Proposed a three-step model that 1) includes a basic anonymization process, 2) holds data producers immune from privacy-related liability, and 3) ensures that malicious actors are held criminally liable for intentional re-identification and misuse of data

Open in a new tab

Break Out Sessions

Day 2 was devoted to breakout sessions with groups focused on specific questions posed by the think tank organizers (see Appendix 2). These deliberations framed suggestions for next steps, designed to move the science forward, while helping to address policy concerns and develop best practices around identifiability. Outcomes of the breakout group sessions are highlighted in Table 2.

Table 2.

Outcomes of Breakout Group Discussions

Group 1- What factors should be considered in the development of a Federal policy for access to publicly funded “omic” research data	Outcomes of Discussion: Proposed a pilot of a truly open-source dataset that would allow access to all types of data, including “omic” data. Participants willing to contribute their biospecimens and data would be informed that all data would be made public and that anyone would be able to access them. Participants would be informed about the risks and benefits associated with such data sharing.
	• The pilot would involve multiple datasets stratified by varying comfort levels, phenotypes, and/or potential for social stigma. However, in light of continued budgetary constraints, one focused pilot of an open dataset would be a good start
	• The pilot could assemble a large cohort of individuals with specific phenotypes and track it longitudinally, with broad consents to overcome issues of controlled access
	• Patient advocacy groups, public health departments, and epidemiologists could, among others, serve as partners in recruiting individuals for such a cohort

Group 2- What considerations enter into determining whether “omic” data is identifiable?	Outcomes of Discussion: There was consensus that all data are identifiable, but acknowledgment that re-identification requires a link to a matched sample and that some data are at greater risk for identifiability than others. Although some barriers are necessary to prevent against abuses of data, many barriers restrict scientific discovery. Therefore the group proposed a three-tiered “Russian doll model,”
	• The open-access tier (Tier 1) allows anyone to download data contributed by participants who have been informed of identifiability and risk issues but permit wide access.
	• The general controlled access tier (Tier 2) allows investigators to apply for access to data that fall under Exemption 4 or data for which participants have consented to broad sharing.
	• The restricted tier (Tier 3), which allows no data sharing, includes data from specific population studies in which consent has been given for one research study only.

Group 3- What are appropriate ethical constraints to allowing researchers broad access to “omic” data?	Outcomes of Discussion: Participants are willing to consent to use of their data, but that willingness stems partly from a desire to be involved and to know how their data are helping. Formal mechanisms are not needed to keep participants informed.
	• Transparency should occur throughout the process, not just when participants are consenting to contribute specimens or data.
	• The burden of contact and re-contact is not fully on the research community. Participants can be asked to maintain their correct contact information if they want to receive results.
	• Using identifiable data would facilitate respect for participants and the return of research results. However, such a focus also would limit some current practices and use of retrospective samples.
	• The group also discussed the need to include new technologies into current research processes, e.g. online consent that allows participants to choose the level of information they want to share

Group 4- *How can society minimize any risks and maximize any participant benefits of “omic” research?*	Outcomes of Discussion: The levels of risk vary across data types. Moreover, the risks for identifiability must be distinguished from the risk for harm—for example, identification of an adopted child's birth parents, risks to someone running for office, inability to obtain health or life insurance, risks to family members etc.
	• Real instances of misuse have related to issues of data security and, so far, re-identification has occurred primarily within the research setting. Therefore, the risks to research participants remain remote.
	• Efforts to address remote risks could minimize the benefits to research participants and the community. Accordingly, studies should minimally report aggregate data.
	• Should use streamlined data access committees to control the release of data and requiring investigators to accept accountability by registering for access to data.
	• In addition, training data are needed to develop and improve bioinformatics.

Open in a new tab

Two broad themes emerged from think tank discussions about the identifiability of human tissue and data. The first concerned the issue of whether additional institutional or legal measures were necessary to address either negligent or intentional violations of research participant privacy. The second concerned the extent to which greater understanding and expansion of data sharing options might provide an opportunity to increase engagement of participants in the research enterprise.

In regard to the first theme, some experts advocated that current governance strategies such as honest broker coding models and irreversible de-identification systems, while not immune to mishap, function well in many institutional biorepositories. According to this view, while it has indeed been demonstrated that with targeted effort a researcher may infer specific information about an individual from aggregated data (such as cohort participation in case or control group), this should not condemn current de-identification approaches and best practices. Although often highly publicized, incidents of re-identification from trace amounts of individual data within aggregated datasets are usually isolated and thus need not warrant rewriting of research consent disclosures about privacy, or constriction of researcher access to data. While misuse of genetic information by researchers is demonstrably feasible, it constitutes such an extreme breach of medical and professional ethics as to best be handled by targeted professional measures rather than broad policy strokes. For example, wrongdoers might potentially be banned from participation in federally funded research studies under procedures for handling scientific misconduct⁹ under federal regulations. These procedures contemplate researcher debarment for practices seriously deviating from those that are commonly accepted within the scientific community for proposing, conducting, or reporting research. Fines and compensation to individuals whose data have been improperly accessed or misused could provide additional deterrents to abuse.¹⁰ The research engine as a whole, however, should not be hindered by the remote possibility of an errant investigator re-identifying individuals without authorization. Several discussants suggested that discrimination or other harm caused by unauthorized re-identification of research participants seems largely to be a theoretical problem at this time. Moreover, in order to identify an individual in a pooled dataset, a separate reference sample or at least some independent data about the individual is still generally required.

The second theme that emerged at the think tank takes a different, although not truly contradictory, approach to concerns regarding the risk of re-identification. This view is focused on the concept of balancing privacy risk with the potential benefits of broad data sharing. There is developing consensus that anonymity is more difficult to guarantee in the genomic era¹¹. Several meeting participants suggested that all “omic” data is theoretically identifiable. Moreover, the problem of re-identification is not unique to biomedical research and existing public databases make re-identification of individuals easier ¹². Given our waning ability to guarantee complete de-identification of data, some experts felt that medical researchers need to improve communication with participants regarding the remote potential for data security breaches, and counterbalance the risks by conveying the potential societal and individual benefits of data sharing. Such benefits include the medical utility of participant contributions of data, and the ability to return clinically relevant incidental findings or other research results under appropriate circumstances.

Discussion

As “omic” research advances and data are shared more broadly, communication of potential re-identification risks for participants is a challenge. Greater transparency is needed when informing participants about the limitations of data privacy, and efforts should be made to more clearly convey the complexity of this issue. While consent forms routinely disclose that researchers cannot guarantee privacy, it is not clear whether research participants understand the rationale behind this limitation. In a study that examined comprehension of consent documents, participants scored poorly on a question evaluating the understanding of confidentiality issues¹³. By improving communication with participants, participants may become more willing to embrace sharing of their submitted data. Some meeting participants suggested this is an opportunity to advance recruitment in medical research. A recent European study suggests that participants may be less concerned about the risks of data sharing and more interested in potential positive outcomes of such sharing¹⁴.

Much discussion at the think tank focused on the utility and empowerment of data sharing and its potential to stimulate collaborative engagement with research participants. Individuals facing illness or chronic disease may visit and join internet groups such as CancerCompass® and PatientsLikeMe®, where they can share medical outcomes and learn about the symptoms or treatment response of others. The growth of such electronic communication spaces indicates the importance attached by many to the benefits of sharing their personal medical data.

As society becomes more accustomed to sharing and receiving information through social media and mobile communications, people who commonly use these resources might be more willing to release their genomic information to researchers who have engendered their trust. Others who are less comfortable with, or have no access to, such commercial technologies may be less inclined to disclose their personal information broadly. Nonetheless, the increasing popularity of social media and mobile technologies may underscore a cultural shift toward greater openness about personal information extending beyond early adaptors.

In a randomized trial of consent options for data sharing in genome research by McGuire et al ¹⁵, 86.2% of research participants chose to release their genetic and clinical information in either open access (through the internet) or scientific (restricted for medical research only) databases, even after a debriefing and opportunity to change their data release option. A follow-up study showed that when deciding whether protecting privacy or advancing research is more important, participants who somewhat or highly trusted researchers predominantly chose advancing research as having greater priority ¹⁶. Participants in these studies were recruited by physicians at hospitals where they or their family members were receiving treatment, which may have influenced willingness to share data. The studies nonetheless suggest that research participants will incur some privacy risk with their individual genomic data when they trust those conducting the research and perceive that their information will benefit others.

Discussions of individual privacy risk in medical research trigger additional ethical challenges relating to risks for family members and genetically similar populations. If a privacy breach results in the disclosure of information pertaining to an inherited mutation, for example, that information may harm not only the individual research participant but other genetically related people as well. Under such circumstances, an individual's choice to participate in research may result in group discrimination or stigmatization. A research participant belonging to a relatively homogeneous genetic population who permits broad sharing of individual research data may enable the discovery of incidental findings that impact the entire population. The severity of consequences for public trust and future research recruitment may, in such cases, be quite high even though the statistical risk of re-identification of the individual research participant is quite low.

NIH data sharing policy ^1–2embraces the importance of data sharing as a tool for research progress but we are only just beginning to explore how research participants understand and balance the risks and benefits of broad “omic” data sharing for translational medical research. While extending researchers' access to “omic” data imposes a degree of individual and group privacy risk, some research participants might prefer the informational benefits of data sharing, particularly when offered the return of clinically actionable research results similar to the proposed collaborative Informed Cohort model¹⁷. Others might prefer more conventional de-identification options. We would anticipate that when data sharing strategies are clearly communicated during the consent process, and research participants understand the potential risks and benefits of available data sharing options, they are likely to be more comfortable releasing their data for broad future research uses.

The NCI think tank concerning the identifiability of biospecimens and “-omic” data not only motivated dialogue across a range of expert perspectives, but identified opportunities for agency guidance, policy development, and empirical research. As an initial next step, more thought must be given to use of the term “identifiability” in research documents, as well as its application in regulations and guidance. Identifiability is often conflated with the term “privacy” in consent forms, though the meaning of these terms is not identical. Privacy is a multi-faceted concept with roots in Constitutional Law, encompassing broadly the right to be free from unwarranted invasion of personal liberties and the ability to make personal decisions affecting marriage, contraception and other intimate matters. Identifiability, on the other hand, refers simply to the ability to determine unique facts about somebody, or to figure out who they are. Consent forms often speak to the ability to protect privacy, but in reality investigators can only seek to protect identifiability.

Moreover, the term “identifiability” itself has multiple meanings. In the research context, the term refers to the technical capability to resolve individuals through linkage to matched samples or independent data. In the regulatory context, however, identifiability refers quite differently to whether specific investigators may readily ascertain a research participant's identity, or gain access to specified identifiers. Thus, “de-identified” data in the regulatory context may be scientifically identifiable, causing confusion when the term is used. Institutions that engage in “omic” research, including biorepositories and academic medical centers, should examine how their data protection policies reflect the different meanings of “identifiability” and ensure clarity in their policies and consent documents. Institutional guidelines should be developed to assist investigators and IRBs in determining whether specific “omic” data are or are not identifiable. Such guidelines should illustrate the connections among the various definitions, laws, and policies related to privacy and identifiability. Beyond institutional policy, clarification regarding identifiability at a broader level can be implemented through the research industry, NIH policy measures, endorsements from professional organizations, and changes in regulations such as the Health Insurance Portability and Accountability Act (HIPAA) and the DHHS Common Rule.

Additional important steps include the development of strategies for greater transparency in communicating data sharing options, and empirical investigation of participant preferences in balancing the risks and benefits of various data sharing models. The public seems to support broad research uses of biospecimens and collected medical data¹⁸, but the extent to which people understand the potential risks and benefits of different tiers of data sharing (i.e., controlled access v. public access) is not clear. In order to improve scientific literacy about biospecimen donation and possible modes of data sharing, a model suggested for consideration during think tank discussion is that of organ donation. Organ donation programs provide education and community outreach, with the goal of enabling individuals to make informed decisions, documented on their driver's licenses, about donating organs for transplant. Similarly, we could adopt a community based model to disclose the potential risks and benefits of donating biospecimens and associated data for medical research, and allow individuals to document their choices on driver's licenses.

Perhaps the most widely shared observation among think tank participants was the determination that additional empirical research about the perceived and actual risks and benefits of “-omic” data sharing is needed. In particular, we should explore the underpinnings of information altruism (i.e. desire of participants to assist medical research by sharing their personal data broadly) and its relative impact across disease and population communities. Developing research collaborations with private corporations that provide personal genomic information can aid in moving the science forward at a more rapid pace, as many of these companies have research arms and have expressed an interest in collaborating with academic institutions and NIH. Such collaborations would need to proceed cautiously, however, in order to prevent untoward research focus on populations who have both the resources and the inclination to pay for personal genetic information. In addition to research into the relative risks versus benefits of data sharing, further statistical analysis of the probability that someone could be re-identified via his or her “omic” data is needed. This analysis should take into account fiduciary responsibilities and institutional relationships at various stages, including the consent process, storage and stewardship of data, and methods for addressing potential downstream misuse (although at present misuse has not been documented as a widespread problem).

Another think tank suggestion was to conduct a pilot study of participants with a completely open access data model and multiple types of “omic” or phenotypic data. Such a project could investigate the willingness of individuals to share data under this model, examine real-world consequences of truly open-access data, assess the feasibility of enrolling participants using such a model, and describe the experiences of individuals and empirical evidence about whether data would be shared more rapidly. Results from such suggested research could guide the development of best practices for research institutions and biorepositories and inform research consent disclosures about informational risk.

In conclusion, the NCI think tank concerning the identifiability of biospecimens and “omic” data identified two main themes regarding how the research community should manage the difficulty of ensuring de-identification, and the remote risk that future re-identification will occur and cause harm. The first theme concerned whether additional institutional or legal measures are necessary to address a violation of research participant privacy. The second concerned the extent to which greater understanding and expansion of data sharing options might provide an opportunity to increase engagement of participants in the research enterprise. The NCI think tank also identified several opportunities for improved guidance, policy development and empirical research. The challenge of deciphering different uses of the term “identifiability” was discussed, suggesting that more should be done to clarify the meaning for both researchers and study participants. The ability to communicate the changing reality of re-identification risks is fundamental to the development of best practices for seeking the consent of research participants in biospecimen collection and storage protocols. Additionally, we must gather more empirical data about people's data sharing preferences, in order to facilitate increased secondary research uses of their biospecimens and data and thereby hasten the pace of medical progress. Addressing these priorities should help the research community and policy makers in developing a measured approach to the issues of re-identification in the coming era of high-throughput population-based genomic research and better define the balance between patient protections and advancing science.

Acknowledgements

Funds for the NCI think-tank were provided by HSFB, EGRP, in DCCPS and the OBBR of NCI at the National Institutes of Health (NIH). The authors acknowledge the assistance with meeting planning of Dr. Elizabeth Gillanders (HSFB, EGRP, DCCPS, NCI) and Dr. Jim Vaught (Biorepositories and Biospecimen Research Branch, Division of Cancer Treatment and Diagnosis, NCI). The authors would like to thank all of the participants of the NCI Identifiability think tank listed in Appendix 2 for their intellectual contributions throughout the meeting.

Appendix 1: List of Participants

NCI Identifiability Think Tank Participants

Appendix 2: Breakout Sessions

NCI Identifiability Think Tank Breakout Sessions

The second day of our NCI think tank is devoted to in-depth discussion of subtopics under the umbrella issue of the identifiability of biospecimens and genomic data. The overarching goal of these breakout sessions is to identify the considerations that should inform policy regarding the use and sharing of “omic” data - including legislative, regulatory, contractual, institutional, and IRB policy - that will protect privacy while ensuring appropriate access to researchers. We will break into four smaller discussion groups as described below. Following small group discussion, each breakout will then present its findings to the collective think tank by a selected spokesperson from within the group. These presentations will be followed by broader discussion among all think tank participants to ensure the potential for maximum input and debate by all.

Breakout Sessions:

1)
What factors should be considered in the development of a Federal policy for access to publicly funded “omic” research data?
Sub-questions:
1. Are different types and models of access (open versus controlled versus hybrid) appropriate for different types and levels of data (individual versus aggregate, GWAS versus WES versus WGS)?
2. Is there appropriate justification for treating “omic” data differently from other types of research data?
3. How should Federal policy take into account international data access and privacy standards?
4. What research data or analysis is still needed to address these questions?
2)
What considerations enter into determining whether “omic” data is identifiable?
Sub-questions:
1. What distinguishes “identifiable” data from “de-identified” data?
2. Can data ever truly be “de-identified“ or is that concept outdated in the genomics era?
3. What criteria or standards should be used to establish whether particular types of “omic” research produce identifiable or de-identified data?
4. What research data or analysis is still needed to address these questions?
3)
What are appropriate ethical constraints to allowing researchers broad access to “omic” data?
Sub-questions:
1. What do we know about participant attitudes toward investigator access to their DNA and the privacy-utility tradeoff of limiting data access?
2. What do research participants and the public actually understand about use of DNA in research (e.g., growth of cell lines, induced pluripotent stem cells), and what should they be informed about before consenting to participate?
3. To what extent should the concepts of autonomy, beneficence, and justice limit access by researchers to an individual's “omic” data?
4. What research data or analysis is still needed to address these questions?
4)
How can society minimize any risks and maximize any participant benefits of “omic” research?
Sub-questions:
1. What are the risks of various “omic” research technologies and the data they can produce?
2. How can “omic“ studies be designed to maximize individual participant, family, and community benefits (through, for example, the return of individual or group population research results)?
3. What public or regulatory policies would promote appropriate balance of the risks and benefits of “omic” research and help to avoid unwanted disclosures of identity and future uses of DNA for undesired purposes?
4. What research data or analysis is still needed to address these questions?

Footnotes

Supplementary information is available at the Genetics in Medicine website.

References

1.National Institutes of Health NOT-OD-07-088: Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS) 2007 http://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html.
2.National Institutes of Health NOT-HG-10-006 Notice on Development of Data Sharing Policy for Sequence and Related Genomic Data. 2009 http://grants.nih.gov/grants/guide/notice-files/NOT-HG-10-006.html.
3.Homer N, Szelinger S, Redman M, et al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS Genet. 2008;4(8):e1000167. doi: 10.1371/journal.pgen.1000167. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Schadt EE, Woo S, Hao K. Bayesian method to predict individual SNP genotypes from gene expression data. Nat Genet. 2012;44(5):603–608. doi: 10.1038/ng.2248. [DOI] [PubMed] [Google Scholar]
5.Im Hae K, Gamazon Eric R, Nicolae Dan L, Cox Nancy J. On Sharing Quantitative Trait GWAS Results in an Era of Multiple-omics Data and the Limits of Genomic Privacy. The American Journal of Human Genetics. 2012;90(4):591–598. doi: 10.1016/j.ajhg.2012.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Lin Z, Owen AB, Altman RB. Genomic Research and Human Subject Privacy. Science. 2004 Jul 9;305(5681):183. doi: 10.1126/science.1095019. 2004. [DOI] [PubMed] [Google Scholar]
7.Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y. Identifying personal genomes by surname inference. Science. 2013 Jan 18;339(6117):321–324. doi: 10.1126/science.1229566. [DOI] [PubMed] [Google Scholar]
8.Department of Health and Human Services . In: Human subjects research protections: Enhancing protections for research subjects and reducing burden, delay and ambiguity for investigators. Department of Health and Human Services, editor. Vol 76. 2011. pp. 44512–44531. Federal Register. [Google Scholar]
9.Department of Health and Human Services Subpart A - Responsibility of PHS Awardee and Applicant Institutions for Dealing With and Reporting Possible Misconduct in Science. 2000;Vol 50.102 42. Code of Federal Regulations. [Google Scholar]
10.Presidential Commission for the Study of Bioethical Issues . In: Privacy and progress in whole genome sequencing. Department of Health and Human Services, editor. Washington, D.C.: 2012. p. 75. [Google Scholar]
11.Schadt EE. The changing privacy landscape in the era of big data. Mol Syst Biol. 2012:8. doi: 10.1038/msb.2012.47. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Ohm P. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review. 2009;57:1701. [Google Scholar]
13.Ormond KE, Cirino AL, Helenowski IB, Chisholm RL, Wolf WA. Assessing the understanding of biobank participants. American Journal of Medical Genetics Part A. 2009;149A(2):188–198. doi: 10.1002/ajmg.a.32635. [DOI] [PubMed] [Google Scholar]
14.Hobbs A, Starkbaum J, Gottweis U, Wichmann HE, Gottweis H. The privacyreciprocity connection in biobanking: comparing German with UK strategies. Public Health Genomics. 2012;15(5):272–284. doi: 10.1159/000336671. [DOI] [PubMed] [Google Scholar]
15.McGuire AL, Oliver JM, Slashinski MJ, et al. To share or not to share: a randomized trial of consent for data sharing in genome research. Genet Med. 2011 Nov;13(11):948–955. doi: 10.1097/GIM.0b013e3182227589. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Oliver JM, Slashinski MJ, Wang T, Kelly PA, Hilsenbeck SG, McGuire AL. Balancing the Risks and Benefits of Genomic Data Sharing: Genome Research Participants' Perspectives. Public Health Genomics. 2012;15(2):106–114. doi: 10.1159/000334718. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kohane IS, Mandl KD, Taylor PL, Holm IA, Nigrin DJ, Kunkel LM. Reestablishing the Researcher-Patient Compact. Science. 2007 May 11;316(5826):836–837. doi: 10.1126/science.1135489. 2007. [DOI] [PubMed] [Google Scholar]
18.Trinidad SB, Fullerton SM, Ludman EJ, Jarvik GP, Larson EB, Burke W. Research Practice and Participant Preferences: The Growing Gulf. Science. 2011 Jan 21;331(6015):287–288. doi: 10.1126/science.1199000. 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Lowrance WW, Collins FS. Ethics. Identifiability in genomic research. Science. 2007 Aug 3;317(5838):600–602. doi: 10.1126/science.1147699. [DOI] [PubMed] [Google Scholar]

[R1] 1.National Institutes of Health NOT-OD-07-088: Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS) 2007 http://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html.

[R2] 2.National Institutes of Health NOT-HG-10-006 Notice on Development of Data Sharing Policy for Sequence and Related Genomic Data. 2009 http://grants.nih.gov/grants/guide/notice-files/NOT-HG-10-006.html.

[R3] 3.Homer N, Szelinger S, Redman M, et al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS Genet. 2008;4(8):e1000167. doi: 10.1371/journal.pgen.1000167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Schadt EE, Woo S, Hao K. Bayesian method to predict individual SNP genotypes from gene expression data. Nat Genet. 2012;44(5):603–608. doi: 10.1038/ng.2248. [DOI] [PubMed] [Google Scholar]

[R5] 5.Im Hae K, Gamazon Eric R, Nicolae Dan L, Cox Nancy J. On Sharing Quantitative Trait GWAS Results in an Era of Multiple-omics Data and the Limits of Genomic Privacy. The American Journal of Human Genetics. 2012;90(4):591–598. doi: 10.1016/j.ajhg.2012.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Lin Z, Owen AB, Altman RB. Genomic Research and Human Subject Privacy. Science. 2004 Jul 9;305(5681):183. doi: 10.1126/science.1095019. 2004. [DOI] [PubMed] [Google Scholar]

[R7] 7.Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y. Identifying personal genomes by surname inference. Science. 2013 Jan 18;339(6117):321–324. doi: 10.1126/science.1229566. [DOI] [PubMed] [Google Scholar]

[R8] 8.Department of Health and Human Services . In: Human subjects research protections: Enhancing protections for research subjects and reducing burden, delay and ambiguity for investigators. Department of Health and Human Services, editor. Vol 76. 2011. pp. 44512–44531. Federal Register. [Google Scholar]

[R9] 9.Department of Health and Human Services Subpart A - Responsibility of PHS Awardee and Applicant Institutions for Dealing With and Reporting Possible Misconduct in Science. 2000;Vol 50.102 42. Code of Federal Regulations. [Google Scholar]

[R10] 10.Presidential Commission for the Study of Bioethical Issues . In: Privacy and progress in whole genome sequencing. Department of Health and Human Services, editor. Washington, D.C.: 2012. p. 75. [Google Scholar]

[R11] 11.Schadt EE. The changing privacy landscape in the era of big data. Mol Syst Biol. 2012:8. doi: 10.1038/msb.2012.47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Ohm P. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review. 2009;57:1701. [Google Scholar]

[R13] 13.Ormond KE, Cirino AL, Helenowski IB, Chisholm RL, Wolf WA. Assessing the understanding of biobank participants. American Journal of Medical Genetics Part A. 2009;149A(2):188–198. doi: 10.1002/ajmg.a.32635. [DOI] [PubMed] [Google Scholar]

[R14] 14.Hobbs A, Starkbaum J, Gottweis U, Wichmann HE, Gottweis H. The privacyreciprocity connection in biobanking: comparing German with UK strategies. Public Health Genomics. 2012;15(5):272–284. doi: 10.1159/000336671. [DOI] [PubMed] [Google Scholar]

[R15] 15.McGuire AL, Oliver JM, Slashinski MJ, et al. To share or not to share: a randomized trial of consent for data sharing in genome research. Genet Med. 2011 Nov;13(11):948–955. doi: 10.1097/GIM.0b013e3182227589. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Oliver JM, Slashinski MJ, Wang T, Kelly PA, Hilsenbeck SG, McGuire AL. Balancing the Risks and Benefits of Genomic Data Sharing: Genome Research Participants' Perspectives. Public Health Genomics. 2012;15(2):106–114. doi: 10.1159/000334718. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Kohane IS, Mandl KD, Taylor PL, Holm IA, Nigrin DJ, Kunkel LM. Reestablishing the Researcher-Patient Compact. Science. 2007 May 11;316(5826):836–837. doi: 10.1126/science.1135489. 2007. [DOI] [PubMed] [Google Scholar]

[R18] 18.Trinidad SB, Fullerton SM, Ludman EJ, Jarvik GP, Larson EB, Burke W. Research Practice and Participant Preferences: The Growing Gulf. Science. 2011 Jan 21;331(6015):287–288. doi: 10.1126/science.1199000. 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Lowrance WW, Collins FS. Ethics. Identifiability in genomic research. Science. 2007 Aug 3;317(5838):600–602. doi: 10.1126/science.1147699. [DOI] [PubMed] [Google Scholar]

PERMALINK

NCI Think Tank Concerning the Identifiability of Biospecimens and “-Omic” Data

Carol J Weil, J.D.

Leah E Mechanic, Ph.D., M.P.H.

Tiffany Green, M.H.S., M.P.H.

Christopher Kinsinger, Ph.D.

Nicole C Lockhart, Ph.D.

Stefanie A Nelson, Ph.D.

Laura L Rodriguez, Ph.D.

Laura D Buccini, Dr.PH, M.P.H

Abstract

Introduction

Setting