Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jan 29.
Published in final edited form as: Ann Epidemiol. 2016 Jun 1;26(7):515–519. doi: 10.1016/j.annepidem.2016.05.007

“The Google of Healthcare”: Enabling the privatization of genetic bio/databanking

Kayte Spector-Bagdady 1
PMCID: PMC6988384  NIHMSID: NIHMS1067513  PMID: 27449572

Abstract

Purpose:

23andMe is back on the market as the first direct-to-consumer genetic testing company that “includes reports that meet Food and Drug Administration (FDA) standards….” But, whereas its front-end product is selling individual genetic tests online, its back-end business model is amassing one of the largest privately owned genetic databases in the world. What is the effect, however, of the private control of bio/databases on genetic epidemiology and public health research?

Methods:

The recent federal government notices of proposed rulemaking for: (1) revisions to regulations governing human subjects research and (2) whether certain direct-to-consumer genetic tests should require premarket FDA review, were reviewed and related to the 23andMe product, business model, and consumer agreements.

Results:

FDA regulatory action so far has focused on the return of consumer test reports but it should also consider the broader misuse of data and information not otherwise protected by human subjects research regulations.

Conclusions:

As the federal government revises its decades-old human subjects research structure, the Executive Office of the President (EOP) should consider a cohesive approach to regulating private genetic bio/databanks. This strategy should allow the FDA and other agencies to play a role in expanding current regulatory coverage.

Keywords: Medical device, Human experimentation, Genetic testing, Database, Biological specimen bank, Genetics, 23andMe, Myriad, Direct-to-Consumer

Introduction

23andMe is back on the market as the first direct-to-consumer (DTC) genetic testing company that “includes reports that meet Food and Drug Administration (FDA) standards for being clinically and scientifically valid [1].” Its current product includes 36 health-related carrier-status reports and consumers’ raw genetic data (in addition to ancestry and other nonmedical “wellness and trait” information) [1]. Forbes has reported that recent investors estimate its value at $1.1 billion [2].

But, that valuation is not on the basis of 23andMe’s $199 test kits. Whereas its front-end product is selling individual genetic tests online, its back-end business model is amassing one of the largest genetic bio/databanks in the world [3], [4]. Since 2007, 23andMe has offered an inexpensive product to consumers (personalized genetic analysis) to generate broader consumer data and then leveraged that data to generate profit, becoming—as board member Patrick Chung put it—“the Google of personalized health care [3].” And 23andMe recently surpassed its goal of 1 million consumers [5].

Although the focus of the governmental and academic debate surrounding DTC genetic testing has been on whether FDA regulation is enough to protect consumers receiving sensitive medical information without a clinician intermediary [6], the more important question moving forward will be how to manage increasingly large and valuable private bio/databanks. As the U.S. federal government, and in particular the Department of Health and Human Services (HHS), considers revisions to its regulations governing human subjects research to include de-identified human biospecimens and whether certain DTC genetic tests should require premarket FDA review, this article argues that the Executive Office of the President (EOP) should take into consideration potential enabling of the private genetic bio/databank market when contemplating the individual and public health effects of its administrative rulemaking.

The 23andMe bio/databank

When consumers purchase the 23andMe product, the company analyzes hundreds of thousands of their single nucleotide polymorphisms to produce genetic information [7]. In so doing, consumers contribute both their saliva specimen to 23andMe’s biobank and their genetic analysis to its databank. No matter whether 23andMe is returning ancestry, wellness, trait, or carrier-screening information to the consumer, the genetic data it is generating can be much more robust. Consumers are asked if they would like to have their biospecimen destroyed after their genetic data is analyzed; however, 23andMe’s Full Privacy Statement adds that it will only do so if “legal and regulatory requirements” do not require it to maintain biospecimens [8]—making it unclear whether and under what circumstances they are destroyed.

In addition to personalized genetic information, 23andMe consumers can also contribute “self-reported information,” which includes all information that the consumer explicitly provides, or that 23andMe can track while consumers are signed in to their 23andMe account (note that this means that 23andMe can get data from other websites the consumer is using as long as they are also signed in to 23andMe.com) [8]. This self-reported information includes answers to continuous pop-up surveys on the 23andMe website regarding trait, heritage, health, and family history information. 23andMe consumers answer almost two million questions each week to contribute to its database and it recently launched an “app” to make surveys accessible from mobile devices as well [9], [10].

According to the Full Privacy Statement, by virtue of “using our Services,” all 23andMe consumers agree (among other things) to allow 23andMe to:

  • Use individual-level genetic and self-reported information to “perform research and development activities;” and

  • Share aggregate genetic and self-reported information with third parties (including commercial entities) [8].

Of the ~1 million 23andMe consumers, over 800,000 also signed a Research Consent Document [9], [11]. If a consumer signs this research consent, there appear to be only two major differences from the Full Privacy Statement. Research participants additionally consent to:

  1. 23andMe providing their deidentified individual-level genetic and self-reported information to third parties (including commercial entities); and

  2. Enabling 23andMe researchers to receive federal funding for their work and/or publish it in peer-reviewed literature [8], [11].

Note that although “research” is typically defined as a “systematic investigation … designed to develop or contribute to generalizable knowledge” [12], 23andMe limits its definition of “research” in its Research Consent Document to systemic investigations “aimed at publication in peer-reviewed journals and other research funded by the federal government … ” [8], [11]. Therefore, 23andMe’s definition of research turns on whether a third party will hold it accountable to research industry standards. Information that research participants agree to share also includes “any information you submitted prior to giving consent to research (emphasis added) [11].” In addition, if consumers have their sample stored, 23andMe might “use the results of further analysis of your sample [11].”

Lay consumers, or those not reading the Full Privacy Statement and Research Consent Document carefully and in conjunction, might assume that purchasing 23andMe means that they consent to personally receiving their genetic information and that they will only be involved in research if they sign the Research Consent. But, that does not appear to be the case. The major difference between 23andMe consumers and research participants is whether 23andMe can share aggregate or individual-level data with third parties such as commercial entities and how 23andMe can fund and publish the research to which all consumers have already consented via their purchase.

The established breadth of data and dynamic cohort 23andMe has created with these agreements has made it an attractive business partner. The company has access agreements with 30 pharmaceutical and biotech companies—including Alnylam Pharmaceuticals, Biogen, Gentech, Pfizer, and P&G Beauty–in addition to partnerships with academic and nonprofit organizations [9], [13]. As the owners of the most samples from participants with Parkinson’s disease, for example, 23andMe recently entered into a $60 million whole genome sequencing deal with Genentech. Anne Wojcicki, cofounder and CEO of 23andMe, was blunt: “we can do things much faster and more efficiently than any other research means in the world [14].”

Potential problems with the 23andMe cohort

Although some companies vie for the opportunity to collaborate with and gain access to 23andMe’s database, there are others who have voiced caution. A first concern is related to demographic bias. Private data sets are much more likely to be populated with educated, wealthy, white participants (a selection bias problem 23andMe itself has tried to address [15]). Such cohort disparities can skew research agendas in the future as researchers only have access to data from a limited portion of the population [4].

A second concern is the intended outcomes of such a private database. Although 23andMe has advertised its research agenda as creating a cohort to “produce revolutionary findings that will benefit us all,” its actual outcomes have been more limited. Some 23andMe consumers, for example, were surprised on May 28, 2012 when 23andMe announced it had filed for and received a patent on “polymorphisms associated with Parkinson’s disease.” Some consumers complained on the 23andMe website about the perceived lack or miscommunication about appropriate outcomes of its Parkinson’s research [16]. A similar issue (unrelated to 23andMe) was litigated in a 2003 Florida case where over 100 families affected by the genetic disorder Canavan disease donated money, blood, tissue samples, and health information to researchers at Miami Children’s Hospital to support their research in isolating the genetic variant associated with Canavan to help other families. When Miami researchers did so, they patented the diagnostic test. The families sued the researchers, but a Florida court found that (although they might have had a case for unjust enrichment) because the families had voluntarily “donated” their specimens, they could not prevent the patent and collection of related licensing fees [17].

Third are potential privacy issues. Although the current regulatory structure, discussed in the following section, largely bases its protections on whether data are identifiable or not, large-scale and whole genome sequencing have resulted in genetic data that, while perhaps not readily identifiable, are uniquely identifiable as belonging to only one possible individual [18]. For example, in 2013, Gymrek et al. reidentified the deidentified personal genomes of over 50 consumers of a genetic genealogy database [19]. Beyond outsider misuse of data is also the possibility of sponsor misuse such as, for example, in 2010 when 23andMe mistakenly sent the wrong genetic test results to 96 customers [20].

Last (for the purposes of this article) is the issue of data access. Beginning with the Human Genome Project in 1990, policy makers and public health professionals have emphasized the importance of public access to genetic databases [21]. Although some argue that commercial interest and funding is critical to encourage innovation of therapies, others point out that it is only through open access that researchers can support and work with as much data as possible—as well as verify the results of such research [21], [22]. Genetic epidemiology can contribute to preventative public health measures by, for example, isolating environmental versus genetic risk factors. But, access to a large data set is required to do this research, with some hypothesizing that a genetic cohort would need at least 500,000 participants to be of value [23]. Research of isolated families at risk for genetic disease has met with less success than large-scale genome-wide association studies that require data across a large population [24].

Large private cohorts can disrupt equitable access to such cutting-edge diagnostic testing. Take Myriad, for example, which held patents on BRCA-related tests for breast and ovarian cancer from the late 1990s until the U.S. Supreme Court invalidated some of them in 2013 [25]. With over 1 million participants, by the time the Supreme Court invalidated its patent, the company had already used its market exclusivity to gather the largest sample size of BRCA variants and associated health data on the market. Although competitors joined the market after the ruling, Myriad still maintains a monopoly on the most robust data on which to base the most accurate and reliable results [26].

President Obama’s new Precision Medicine Initiative biobank is founded on the concept of empowering participant and researcher access and use [27]. Although the goal of the Precision Medicine Initiative is individualized clinical care, the process requires public health research and analysis as “[u]ltimately, genetic knowledge will only be useful in the clinical arena if it can be placed in an epidemiological and medical or public health context [23].” The EOP should note that problems with skewed participant demographics, confusion about the appropriate outcomes of crowd-sourced research, privacy concerns, and private cohorts of genetic data can stagnate the advancement of such research—and possibly the field of personalized medicine itself.

Current law

Such potential private bio/databanks issues have caused scientists, scholars, and lawmakers to question how our existing regulatory structure (in large part created before the advent of DTC genetics and such banks) can and should limit private cohorts. Although there are many laws and regulations of relevance to such banks, this article will focus on the two currently under revision—the regulations governing human subjects research and the premarket FDA review of certain DTC genetic tests—and question their potential effects on the 23andMe bio/databank.

Human subjects research regulations

The “Common Rule”—so-called for its adoption across 17 federal departments and agencies—regulates research with human subjects. It requires protections such as institutional review board (IRB) approval of research protocols and fully informed consent [28]. As a threshold matter, currently only interventions or interactions with individual research participants or their identifiable private health information are considered human subjects research [28]. It also only covers research conducted or funded by the numerous agencies that have adopted it [29], but its reach is often much broader. Privately funded researchers can find themselves legally obligated to abide by the Common Rule if they partner with researchers with federal funding/at an institution that requires all human subjects research to follow the regulations, or, in some cases, if they wish to use their research in support of a new drug or device application. The 23andMe context thus raises two questions: (1) is 23andMe conducting human subjects research, and if so, (2) is that research covered by the regulations?

The first question was debated publicly in 2009 when 23andMe submitted an article to the peer-reviewed journal, PLoS Genetics [10]. PLoS editors took 6 months to consider whether publication was appropriate because 23andMe had not followed the traditional proscribes of informed consent and/or IRB review [30], [31]. 23andMe argued that it was not conducting human subjects research because the data to which the researchers had access were deidentified. The PLoS editors complained that “[o]n the face of it, this seems preposterous,” but in the end relented because “there was never any interpersonal contact between investigator and participant (that is, data and samples are provided without participants meeting any investigator), and the participant names are anonymous with respect to the data seen by the investigators [30].” PLoS Genetics decided that the 23andMe work warranted publication but noted in an accompanying editorial the “unfortunate loophole” that while “obtaining a consent form would still be desirable, there are no guidelines or policy with regard to how such a consent form should be developed and reviewed in an ethically responsible manner [30].”

The 23andMe Privacy Statement and Research Consent confirm that if consumers participate in research, their data are deidentified [8], [11]. But, 23andMe also states that it might provide consumers with additional reports that it runs on their data in the future [32], so the link between data and consumer identity must be maintained [33]. How arms-length must a reidentification key be held from researchers in order for the data to legitimately be considered deidentified?

Office of Human Research Protection guidance states that if specimens were not collected from participants specifically for research, and investigators are following IRB-approved policy that prohibits the release of the reidentification key to them, the work is not human subjects research [34]. Other biobanks have proceduralized this guidance via a system involving a third-party “Honest Broker” model. This Honest Broker is introduced as a trusted intermediary between the researchers working with deidentified biospecimens and the entity that holds the identifying information [35].

It is unclear who might act as the Honest Broker between 23andMe and its own employee-investigators (which, for the PLoS Genetics article, included both its CEO and Senior Director of Research). When 23andMe received a National Institutes of Health (federal) grant in 2014 [36] and/or when it partnered with academic institutions such as the University of Chicago [13], it maintained that its research, albeit federally funded, was still not subject to the Common Rule because of the researcher use of deidentified data [37]. It is hard to verify whether its deidentification structure meets the standard delineated above.

23andMe announced in June 2010 that it had partnered with an accredited independent review company to approve its research protocols and revise its informed consent form [30]. However, it adamantly maintained that “[b]ecause the 23andMe protocol excludes individual identifying information … and our analysts do not interact directly with customers during data collection, our research technically does not require IRB review. This would be true even if the federal government funded our work [37].”

Whether 23andMe is subject to human subjects research regulations matters. Currently, if 23andMe violates its consent agreement (e.g., when Fitbit published users’ sexual activity statistics online in 2011 [38]), this would be considered to be a breach of contract, not a violation of research regulations. Wronged consumers would have to do more than just establish that the action violated their consent—they would have to establish that they were actually damaged by that breach in the tort liability context. And, from a legal standpoint, it is hard to set a price on dignitary harms and individual damages when data are misused.

Medical device regulations

The other major area of governance for 23andMe is FDA regulations. FDA is tasked with assuring the U.S. public of the “safety, effectiveness, quality, and security” of medical interventions, such as genetic testing [39]. It regulates a genetic test as a “medical device [40]” on the basis of risk—the greater the perceived risk of the device for a patient or consumer, the more protections FDA requires [41]. For example, a Class I device (such as a toothbrush) need only follow “general controls,” such as registration with FDA; a Class II device (such as a mercury thermometer) requires premarket notification to FDA of intent to sell; and a Class III device (such as a pacemaker) is subject to the most stringent premarket approval process [42], [43], [44].

When 23andMe first entered the market in 2007, it did not follow any of these proscriptions and received a public FDA reprimand in 2010. In 2012, 23andMe became the first DTC genetic testing company to file premarket paperwork with FDA; however, it also concurrently launched a national advertising campaign of its yet unauthorized product. This earned 23andMe a formal FDA Warning Letter (essentially a cease and desist order), and it was forced to pull its health-related tests off the market [45]. Despite the fact that 23andMe still offered its ancestry screening (and as such continued to collect user-generated genetic data), its product sales dipped by half during that period [46]. But, 23andMe and FDA continued to work together toward premarket clearance, and in February 2015 FDA announced it had cleared 23andMe’s Bloom Syndrome test (an autosomal recessive carrier screen) [47]. FDA is currently tolerating 23andMe’s marketing of the rest of its autosomal carrier-screen panel without premarket authorization (as required by the current regulations).

Thus, the focus of FDA regulatory action so far has been on the consumer-facing product of test reports. FDA does not currently involve itself in the consumer data 23andMe generates, collects, analyzes, and stores.

Proposed law

Two recent federal regulatory notices are particularly relevant at this juncture.

First, in September 2015, the highly anticipated Notice of Proposed Rulemaking (NPRM) of revisions to the Common Rule was released. These revisions focus on attempting to make informed consent more meaningful in the human subjects research context—broadening the scope of regulatory jurisdiction in some cases, and limiting it in others [48]. Of particular importance to this discussion is the NPRM proposal to expand the definition of a “human subject.” As discussed previously, currently research with a biospecimen only falls under the regulations if it is associated with identifiable information [12]. However, the NPRM proposes that all research with human biospecimens (as opposed to just those that are identifiable) fall under its jurisdiction. This would include secondary research on biospecimens stored in a biobank. A major justification for this proposal is that “New methods, more powerful computers, and easy access to large administrative data sets … have meant that some types of data that formerly were treated as nonidentified can now be reidentified through combining large amounts of information from multiple sources,” and the NPRM cites to the Gymrek et al. reidentification work discussed previously. However, research with deidentified data (ironically, such as those used in the study by Gymrek et al.) would remain beyond the revised Common Rule’s purview [48].

Second, in October 2015, FDA released its proposed revisions to exempt from its Class II premarket review autosomal recessive carrier screens [49]. An autosomal recessive disease generally requires two copies of the abnormal variation to be present in a persons’ genetic makeup for them to experience symptoms of disease. Autosomal recessive carrier screening allows prospective parents without disease manifestation to find out whether they and their biological partner are both carriers of the variant—which would result in a 25% chance of having a child affected by the disease.

FDA’s recent NPRM proposes to exempt the premarket authorization of all DTC autosomal recessive carrier screening (e.g., 23andMe’s current health-related offering). FDA lays forth its risk analysis in detail, including consideration of the device’s characteristics for safe and effective use. Generally, FDA’s risk evaluation for approval considers whether the device will have the effect promised by the labeling if used correctly. In the autosomal recessive carrier-screen context, FDA “considered the risks of both false-positive and false-negative results … ” and found that “given the unique characteristics of an autosomal recessive carrier screening gene mutation detection system, including that both a mother and father must be carriers to have a 25 percent chance that their child would have the disorder … special controls [e.g., a ‘warning statement accurately disclosing the genetic coverage of the test in law terms … ‘] reasonably assure that a legally marketed device of this type will have the characteristics necessary for its safe and effective performance without the need for premarket notification [49].” However, FDA’s current risk assessment of the 23andMe service is based entirely on the data and information that are returned to the consumer.

Discussion

Both these rulemaking notices are important in their own right, but the EOP would be well advised to contemplate the effect of their intersection. If the Common Rule is codified as proposed, hospitals and clinics will have to spend significant resources setting up consent and tracking structures to convert clinical biospecimens to deidentified data for research. The Common Rule NPRM itself anticipates the need for as many as 21 million secondary-use consent forms to be filled out, at 10 minutes each, to seek participant consent and track related information in just the first year of implementation [48]. This conversion may be too costly for some institutions even to attempt. Weill Cornell Medicine in New York City recently estimated that it would cost as much as $4 million annually to comply with the revisions [50].

On the other hand, 23andMe can transform consumer biospecimens and related health information into deidentified research data (outside the purview of the Common Rule) and then sell access to other private or public entities that would otherwise have to follow the more complex regulatory requirements. This would buttress one of the very results that the Precision Medicine Initiative is hoping to avoid—reliance on private bio/databanks for advances in medicine on a fee-for-data basis.

In addition, while FDA assessed the risk of 23andMe’s carrier screens on the basis of its return of results to the consumer, the more profound risk is the generation, collection, storage, and sale of broader genetic analysis and sensitive health information of its consumers to third-party purchasers. The individual risk posed by such deidentified genetic data is acknowledged, but not resolved, by the revisions to the Common Rule [50].

23andMe’s broad data generation is disclosed in its Privacy Statement and through 23andMe’s online offer to consumers to download their “raw chip data” [8], [51]. Therefore, its use of this data should be considered part of 23andMe’s “labeling” [52]. As such, it can and should be taken into consideration by FDA when evaluating the product’s safe and effective use.

Thus, from both an individual protection and public health policy perspective, when FDA evaluates 23andMe’s personal genome product, it should attempt to consider not only the potential misuse of information provided to the consumer, but also the misuse of much broader data and information generated and distributed by the manufacturer.

Conclusion

As the federal government revises the decades-old human subjects research structure, it is necessary for the EOP to consider a cohesive approach to regulating private genetic bio/databanks generally. This strategy should allow FDA and other agencies to play a role in expanding current regulatory coverage for private companies molding themselves in the Google business model image: offering an inexpensive low-risk health-related product for the ulterior purpose of private consumer health data collection. Evaluating bio/databanks independently as assets as vulnerable and valuable as the individual datum that creates them will be critical, as we increasingly rely on their use, to both ensuring that federal funding continues to be the gold standard research resource and that as much research is covered by federal protection as possible. Enhancing the security of federally funded research but enabling private access to biospecimens could drive more research into the private sector and result in less, not more, protection for human subjects.

Acknowledgments

This publication was supported in part by NCI and NHGRI grant 4UM1HG006508-04. The author would like to thank Kata Chillag, Edward B. Goldman, Valerie Gutmann Koch, Paul A. Lombardo, J. Scott Roberts, and Patricia J. Zettler for their review and feedback on this article.

References

RESOURCES