Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Nov 1.
Published in final edited form as: Genet Med. 2012 Nov 29;15(5):325–331. doi: 10.1038/gim.2012.147

Mapping the Incidentalome: Estimating Incidental Findings Generated Through Clinical Pharmacogenomics Testing

Matthew J Westbrook 1, M Frances Wright 1, Sara L Van Driest 2, Tracy L McGregor 2,3, Joshua C Denny 4, Rebecca L Zuvich 3, Ellen Wright Clayton 1,2, Kyle B Brothers 1,2,5
PMCID: PMC3648626  NIHMSID: NIHMS440090  PMID: 23196672

Abstract

Purpose

Greater clinical validity and economic feasibility are driving the more widespread use of multiplex genetic technologies in routine clinical care, especially for pharmacogenomics applications. Empirical data on the numbers and types of incidental findings generated through such testing are needed so that policies and practices for their clinical use can be developed. Of particular importance are disparities in findings relevant to different ancestry groups.

Methods

The Pharmacogenomic Resource for Enhanced Decisions in Care and Treatment Resource (PREDICT) is an institutional program to implement prospective clinical genotyping of 34 pharmacogenomic-related genes to guide drug selection and dosing. We curated 5566 journal articles to quantify and characterize the incidental, non-pharmacogenomic genotype-phenotype associations that could be generated through this clinical genotyping project.

Results

We identified 372 putative incidental genotype-phenotype associations that might be revealed in patients undergoing clinical genotyping for pharmacogenomic purposes. Of these, 287 associations were supported by at least one study demonstrating an odds ratio ≥2.0 or ≤0.5. Numbers of potentially relevant findings varied widely by ancestry group.

Conclusions

Rigorous clinical policies for the clinical management of incidental findings are needed since the sheer number of significant findings could prove overwhelming to healthcare institutions, providers, and patients.

Keywords: Incidental Findings, Incidentalome, Pharmacogenomics, Disease Susceptibility, Genomics

INTRODUCTION

Multiplex technologies, which can detect an array of desired laboratory results pertinent to a particular clinical issue, often yield other ancillary or incidental findings not related to the original motivation for testing. Kohane et al coined the term incidentalome in 2006 to refer to the potentially voluminous collection of ancillary findings that can be generated through multiplex genetic testing technologies. They predicted that this set of ancillary findings could pose a threat to the implementation of genomic medicine because the number of findings generated, particularly through whole genome sequencing, could raise a number of challenges.1 These authors and others have raised concerns that due to the size of the incidentalome, follow-up testing to characterize incidentally identified risks could become very expensive,1,2 especially given the inevitability of false-positive results.1

Even more significant than the challenge of cost, perhaps, is the challenge of scale. The incidental findings generated by a single-gene genetic test are limited in number and can be dealt with relatively effectively and efficiently by a clinical geneticist, a genetic counselor, or another knowledgeable clinician. But the number of results generated by a high-throughput assay such as a SNP chip covering a range of genes, or even a whole genome sequence, could become unwieldy for providers to evaluate and validate. The methods for addressing incidental findings on this larger scale is therefore an issue that must be addressed in discussions focused on identifying when genetic results are ready for clinical application. This is especially important given that the time already required for clinicians to provide routine preventive care is substantial.3

Policies designed to address the challenge of scale will need to be informed by the number and clinical relevance of incidental findings potentially generated using specific multiplex genetic technologies. All things being equal, stronger associations are more likely to be clinically relevant, and most experts agree that findings returned clinically should be clinically “actionable.” The clinical relevance of a particular association for a specific patient additionally depends on the patient’s gender, age, past medical history, and health behaviors. Finally, ancestral origin influences the strength of genetic associations, so the clinical relevance of an association could vary among ancestral groups. For all of these reasons, it will be important to examine whether the number of associations varies among groups of patients.

In this manuscript we provide an empirical assessment of the incidentalome, taking a clinical pharmacogenomics project as a test case. Vanderbilt University Medical Center recently designed and built a gene-agnostic and pharmaceutical-agnostic infrastructure to support the integration of pharmacogenomic variants into routine clinical care.4 This project, named the Pharmacogenomic Resource for Enhanced Decisions in Care and Treatment (PREDICT), currently utilizes Illumina’s VeraCode ADME Core Panel (Illumina, Inc., San Diego, CA) as its genotyping platform. This single nucleotide polymorphism (SNP) based platform interrogates 184 SNPs in 34 genes selected due to their importance in pharmacogenomics. They thus represent especially well-studied targets. In addition to their importance in pharmacogenomics, however, many have implications for other components of clinical care. Since this panel covers only a small fraction of human genes, and thus only a small slice of the incidentalome, it provides a manageable case study to examine the incidentalome-related challenges that will arise as multiplex genetic testing technologies become more widespread.

In this paper we report on the first attempt to “map” the subset of the incidentalome that could be generated by a multi-gene SNP test intended for pharmacogenomic use, using the VeraCode ADME Core Panel as a case study. We conducted a systematic review of published articles to quantify and characterize the total number of ancillary findings associated with the genes included in this panel in order to provide the data necessary to inform practices on reporting incidental findings. These practices should ensure that the aims of efficiency and efficacy envisioned for Personalized Medicine can be attained, while managing the effect of an incidentalome that could be overwhelming or distracting for patients and providers.

MATERIALS & METHODS

Initial Article Database

We performed a comprehensive literature review of all articles available in PubMed as of June 22, 2011 referencing at least one of the 34 genes included in the VeraCode ADME Core Panel (Figure 1). We used the Genopedia tool, an online database of genomics-related articles designed and maintained by the Human Genome Epidemiology Network (HuGENet), to generate our initial panel of articles.5

Figure 1.

Figure 1

Genes Included in Illumina’s VeraCode ADME Core Panel

Inclusion and Exclusion Criteria

A genotype-phenotype association was included in our database if the gene being studied was one of the 34 genes included in the VeraCode ADME Core Panel and the phenotype was a medical condition or characteristic with clinical significance. Excluded phenotypes were (1) those not usually evaluated in the clinical setting (e.g. DNA damage, chromosomal aberrations), and (2) complications of diseases or therapies (e.g. transplant rejection or survival after treatment). Pharmacogenomic associations and associations with non-pharmacological treatment outcomes were also excluded, since these are the primary purpose for the genetic test in the PREDICT program, not “incidental” findings. Genome-wide association studies and manuscripts for which no abstract or text in English was available were also excluded. An association was considered statistically significant if the 95% confidence interval for at least one reported odds ratio (OR) did not cross 1.0 or another valid statistic indicated significance at the 95% confidence level.

Database Curation

Our initial database of published articles included reports on potential genotype-phenotype associations, as well as other studies referencing our genes of interests. We first excluded all articles that were not focused on identifying an association between at least one gene of interest and a clinically relevant phenotype. Next, we determined for each gene of interest referenced whether the study produced a statistically significant (positive) finding or a statistically nonsignificant (negative) finding.

We applied our inclusion and exclusion criteria in two stages. First, we used a computerized algorithm to identify articles that were excluded because they did not report on relevant genotype-phenotype associations (Figure S1A). Second, we hand-curated remaining articles to record positive and negative genotype-phenotype associations and to identify additional articles that should be excluded (Figure S1B).

Computerized Curation

Computerized curation proceeded in two stages. First, we utilized MedEx, a tool originally designed to extract medical information from full-text clinical narratives, to identify articles whose titles refer explicitly to medications.6 These articles were classified as pharmacogenomic in focus and were excluded from this study. Second, additional search criteria were used to identify articles meeting other exclusion criteria. For example, articles whose titles contained the text “DNA damage” were excluded because this indicated a focus on a biomarker not usually evaluated in the clinical setting (Figure S1).

The computerized algorithm was refined to minimize the number of articles that would require hand curation while also minimizing the number of articles incorrectly excluded from review. Random samples of 100 included and excluded articles were reviewed to determine whether articles were mislabeled as qualified or disqualified by the computerized algorithm. The algorithm was then revised iteratively until 100% specificity was attained.

Hand Curation

Five authors then hand-curated remaining articles. Articles were assigned to reviewers at random, and one author (KBB) reviewed a subset of each reviewer’s inclusions and exclusions to ensure accuracy and consistency. Articles reporting incomplete or ambiguous results were subjected to a secondary review. If an article reported on the association between a gene and more than one phenotype, each genotype-phenotype association was recorded and analyzed separately. Records were managed through the online research database tool REDCap.7

For putative genotype-phenotype associations that qualified for inclusion, we recorded the phenotype tested and the population studied. Phenotypes were grouped according to clinical and pathophysiological relationships. For example, angina, acute coronary syndrome, and myocardial infarction were grouped as one phenotype; aerodigestive tract cancers of the head and neck were grouped as another (Figure S2). Populations were grouped at the level of nation-state (Brazil, China, Finland) or region (Eastern Europe), except where the article was explicit that research participants came from a diversity of ethnic or national origins. To facilitate an analysis focused on the largest US ancestral origin groups, findings for European Americans were combined with those from Western and Central Europe, and findings for African Americans were combined with those from Western and Sub-Saharan Africa (Table S1).

For statistically significant findings, we recorded the OR and 95% confidence interval from the strongest association, in terms of magnitude of the odds ratio, reported. We also recorded whether the study examined other factors that could influence the clinical significance of the association including interactions with health behaviors, environmental exposures, occupational exposures, and other gene markers.

Analysis of Clinical Relevance

In order to account for some of the factors that influence the clinical relevance of incidental genotype-phenotype associations, we constructed tables of genotype-phenotype associations relevant to two hypothetical patients living in the US. We did not define or apply criteria for clinical actionability, but instead focused on validity, for which criteria are less stringent. Specifically, we defined an association as “strong” if at least one publication reported an OR for that association to be ≥2.0 or ≤0.5.810 Associations were considered “replicated” if more than one publication reported a positive finding and no publications reported a negative finding. Finally, a finding was considered “clinically relevant” for a hypothetical patient if the association had been demonstrated to have a strong correlation in at least one study conducted with participants from that patient’s ancestral group. Our two case studies involved tallying the number of findings meeting this criterion for a healthy female of European ancestry and for an otherwise identical patient of African ancestry.

RESULTS

Excluded Articles

In all, we reviewed 5566 unique articles. A small number of articles (94) could not be evaluated because an abstract was not available through PubMed and the full text of the article could not be obtained in English.

In total, 3850 articles were excluded: 2391 by the computerized filter and 1459 through hand curation. Examples of excluded articles are systematic reviews, articles reporting on the frequency of genetic variants in populations, and articles reporting novel gene mutations. Among excluded articles, 2277 were found to report only pharmacogenomic findings and 166 were found to report only associations with complications of diseases or treatments.

Genotype-Phenotype Associations

After exclusions, 1715 studies were found to have tested associations between at least one gene of interest and a qualifying phenotype. These studies included single-gene studies, small candidate gene studies, and large pathway-based studies, with an average of 2.0 (SD 1.4) genes of interest examined per article. 26 of the 34 genes included on the VeraCode ADME Core Panel were found to have at least one qualified genotype-phenotype association.

Altogether, we examined 806 putative genotype-phenotype associations, of which 434 had been tested but never supported by a statistically significant finding. 91 putative associations were supported by only one positive study with no published attempts at replication, and 14 had been replicated in at least 2 studies with no published negative findings. There was mixed evidence on most putative genotype-phenotype associations; 267 associations were found to be statistically significant in at least one study and statistically nonsignificant in at least one study (Table 1).

Table 1.

Putative genotype-phenotype associations

Evidence Category Genotype-Phenotype Associations
All Strong
Correlation*
At Least 1 Positive Finding 372 287
   Replicated (≥2 positive findings, 0 negative finding) 14 12
   No Replication Attempts (1 positive finding, 0 negative findings) 91 66
   Mixed Evidence (≥1 positive finding, ≥1 negative finding) 267 209
*

At least one positive finding demonstrated OR ≥ 2.0 or ≤ 0.5

On average, each gene carried statistically significant associations with 10.9 phenotypes, strong associations with 8.4 phenotypes, and strong associations that had been replicated with 0.4 phenotypes (Table 2). The median number of articles examining each genotype-phenotype association was 2. The most studied genotype-phenotype association was a possible association between GSTM1 and lung cancer. This association was examined in 83 articles, 28% of which reported statistically significant findings.

Table 2.

Statistically-significant genotype-phenotype associations by gene

Gene Associated Phenotypes
All Strong Correlation* Strong Correlation,
Replicated**
ABCB1 12 9 1
ABCC2 3 2 0
ABCG2 3 2 0
CYP1A1 40 31 2
CYP1A2 16 10 1
CYP2A6 6 6 1
CYP2B6 2 2 0
CYP2C19 8 6 0
CYP2C8 1 0 0
CYP2C9 7 6 0
CYP2D6 9 7 0
CYP2E1 24 18 0
CYP3A4 7 7 0
CYP3A5 7 4 0
DPYD 0 0 0
GSTM1 60 49 3
GSTP1 42 31 2
GSTT1 55 45 0
NAT1 7 3 1
NAT2 34 31 0
SLC15A2 0 0 0
SLC22A1 0 0 0
SLC22A2 0 0 0
SLC22A6 0 0 0
SLCO1B1 3 1 0
SLCO1B3 0 0 0
SLCO2B1 0 0 0
SULT1A1 11 9 1
TPMT 0 0 0
UGT1A1 7 4 0
UGT2B15 1 1 0
UGT2B17 3 2 0
UGT2B7 1 1 0
VKORC1 3 0 0
*

At least one study indicated at least one gene variant conferred risk with OR ≥ 2.0 or ≤ 0.5

**

Findings replicated in more than one publication, at least one of which demonstrated a correlation at the OR ≥ 2.0 or ≤ 0.5 level, with no negative findings

Ancestral Origin

158 associations were reported in studies relevant to European Americans while 14 associations were reported in studies relevant to African Americans. We identified only one study conducted in a US population explicitly described as white Hispanic. 72 associations were identified in participants living in the US from multiple ancestry groups (Table S1).

Clinical Relevance

287 genotype-phenotype associations were supported by at least one study demonstrating a strong correlation (Table 1). 103 of these associations were supported by more than one such study. The subset of these strong genotype-phenotype associations that could be identified in a healthy female patient of European ancestry are shown in Table 3. In all, genotyping on the VeraCode ADME Core Panel could identify 100 clinically relevant genotype-phenotype associations in a female patient of European ancestry, of which 39 have been replicated. By comparison, the same genotyping could identify 9 clinically relevant genotype-phenotype associations in a female patient of African ancestry, of which only 1 has been replicated (Table 4).

Table 3.

Selected strong (OR ≥ 2.0 or ≤ 0.5) genotype-phenotype associations relevant to a female patient of European ancestry

Gene Selected* Phenotypes with Strong (OR ≥ 2.0 or ≤ 0.5) Association
ABCB1 Breast Cancer, Colorectal Cancer, Inflammatory Bowel Disease
ABCG2 Gout
CYP1A1 Breast Cancer, Colorectal Cancer, Coronary Artery Disease, Endometrial Cancer, Head and Neck Cancer, Leukemia, Lung Cancer, Ovarian Cancer, Psoriasis, Type 2 Diabetes
CYP1A2 Colorectal Cancer, Endometrial Cancer, Hypertension, Ovarian Cancer
CYP2C9 Mood Disorder
CYP2D6 Colorectal Cancer, Lung Cancer, Parkinson's Disease, Scleroderma
CYP2E1 Colorectal Cancer, Gastric Cancer, Head and Neck Cancer, Lung Cancer, Scleroderma
CYP3A4 Lung Cancer
CYP3A5 Hypertension
GSTM1 Alcoholic Pancreatitis, Asthma, Basal Cell or Squamous Cell Carcinoma, Breast Cancer, Esophageal Cancer, Head and Neck Cancer, Hypertension, Lung Cancer, Ovarian Cancer, Renal Cancer, Rheumatoid Arthritis, Urinary Tract Cancer
GSTP1 Breast Cancer, Colorectal Cancer, Gastric Cancer, Hodgkin's Lymphoma, Lung Cancer, Parkinson's Disease, Urinary Tract Cancer
GSTT1 Asthma, Breast Cancer, Colorectal Cancer, Coronary Artery Disease, Esophageal Cancer, Gastric Cancer, Head and Neck Cancer, Leukemia, Lung Cancer, Melanoma, Non-Hodgkin's Lymphoma, Renal Cancer, Urinary Tract Cancer
NAT1 Asthma, Lung Cancer
NAT2 Breast Cancer, Cervical Cancer, Endometriosis, Gastric Cancer, Head and Neck Cancer, Lung Cancer, Ovarian Cancer, Urinary Tract Cancer
SULT1A1 Breast Cancer, Colorectal Cancer, Endometrial Cancer, Gastric Cancer, Head and Neck Cancer, Urinary Tract Cancer
UGT1A1 Endometrial Cancer
*

Relatively minor or rare phenotypes omitted

Table 4.

Strong (OR ≥ 2.0 or ≤ 0.5) genotype-phenotype associations relevant to a female patient of African ancestry

Gene Phenotypes with Strong (OR ≥ 2.0 or ≤ 0.5) Association
ABCG2 Coronary Artery Disease
CYP1A1 Breast Cancer
GSTM1 Asthma, Liver Cancer, Lung Cancer
GSTP1 Asthma, Esophageal Cancer, Lung Cancer
GSTT1 Breast Cancer

DISCUSSION

Only two previous studies have sought to characterize the incidental findings that could be generated through multiplex genetic tests. In 2002, Hirschhorn et al. reported all gene-disease associations that had been identified to that point, excluding genes of known monogenic disorders and associations with HLA markers and blood group antigens.11 Multiplex genetic tests were not being used clinically at that time, so the results from this study were not framed in terms of incidental findings. However, had genome-wide SNP chips been in clinical use at that time, polymorphisms in 268 genes would have generated incidental findings across 133 common diseases and traits. Only 166 genotype-phenotype associations had been examined in at least three studies, and only 6 of those were reproduced in 75% or more of the relevant studies. At that time, the portion of the incidentalome that had been characterized was quite small.

In 2007, Henrikson et al. reviewed 555 article abstracts looking for associations between genetic variants relevant to pharmacogenomics and at least one condition.12 They found that among 42 pharmacogenomic variants, only 22 (52.4%) had been found to be associated with a disease in more than one study, and only 7 (16.7%) had been associated with multiple conditions in two or more studies. We studied a group of 34 genes that overlapped significantly with those studied by Henrickson et al. We found that 20 (58.8%) had been associated with at least one disease in more than one study, and 14 (41.2%) had been associated with multiple conditions in two or more studies.

This study confirmed that replicated ancillary findings are generated through pharmacogenetic tests, but it was not comprehensive enough to quantify the number of results that may be generated. In addition, the relevant science has advanced in the past 5 years. Our study provides an updated and more comprehensive account of the number of genotype-phenotype associations generated through pharmacogenomics testing.

The Incidentalome at the System Level

Our study indicates at least two ways the number of potential genotype-phenotype associations will be important to institutions implementing clinical genotyping. First, the process of identifying the full set of relevant incidental findings was very time intensive. Even with informatics tools, classifying genotype-phenotype associations required a significant amount of time, care, and specialized knowledge. We estimate that, even with the use of a computerized algorithm that excluded approximately 40% of articles, our hand curation of articles required 800 person-hours. Our review did not even consider sample size and power, quality of study design, or the possibility of translating results across different genotyping technologies, nor did we directly evaluate clinical actionability. Given that these more detailed evaluations will take even more time, effort, and expertise, our experience highlights the importance of efforts such as the Human Genome Epidemiology (HuGE) network and the EGAPP initiative to combine efforts across institutions to evaluate the quality of data on genotype-phenotype associations and their readiness for use in clinical practice.1315

Second, our findings make it clear that institutions seeking to translate incidental genotype-phenotype associations into clinical care will need to develop robust informatics systems for delivering this information to providers and patients. In our panel of 34 relatively well-studied genes, the mean number of strong associations (OR ≥2.0 or ≤0.5) generated for each gene was 8.4 phenotypes. Given the pleiotropy of these genes, reporting genotype information alone to providers will not be sufficient. Patients and providers will need more sophisticated reports organizing and synthesizing data on the relevant disease risks and the quality of the data supporting such assessments.

The Incidentalome at the Patient Level

Even if patients and providers are provided with interpreted reports on genotype-phenotype associations, there will still be need for contextualization and prioritization at the patient level. While the size of the incidentalome is large, the number of clinically significant genotype-phenotype associations identified in each patient will vary. Some patients will face an overwhelming number of incidental findings. For example, patients with certain variants in GSTM1 could carry significantly elevated risks for over 40 different phenotypes. Other patients may carry no high-risk variants in the 34 genes probed in such a panel.

One solution to this challenge would be to prioritize results. Healthcare institutions, following the lead of direct-to-consumer genomic testing companies, may choose to make all results available to patients through tools such as a web portal. This approach is consistent with the commonly reported (but not unanimous) patient preference to have access to “everything” there is to know from their genetic testing.1618 But for patients with large numbers of incidental findings, clinic visits will need to focus on only the 5 or 10 most significant results, or the results the provider judges to be most important given the patient’s current medical situation. As long as patients are able to access “everything” through another mechanism, clinical efforts to focus on only the most important results may be well received by patients.

Such an approach may raise concerns with healthcare providers who fear liability for failure to address all potential results.1921 This study demonstrates that clinical approaches that treat all genotype-phenotype associations as laboratory “results” in need of clinical attention will be unworkable. Multiplex genetic technologies have the potential in some patients to generate too many incidental findings for their providers to address them all meaningfully.22 Standards of care related to addressing lab findings will need to be reframed to limit the responsibility of providers to only those results that meet appropriate standards of relevance, utility, and quality. The necessity, and even wisdom, in such an approach is supported by efforts in other fields to prioritize clinical time and attention.23

Prioritizing Results

Berg et al. have proposed a “binning system” by which genetic variants can be “triaged” in the clinical, diagnostic setting according to specific reporting criteria.24 These authors identify three “bins” into which genotype-phenotype associations may be categorized. Bin 1 would contain clinically valid results that also carry clinical utility according to current literature. Bin 2 would contain clinically valid results that are not considered to be actionable. This bin is further stratified into bins 2A, 2B, and 2C. Bin 2A would hold results that are unlikely to cause patients distress (such as risks for common diseases) while Bin 2B and 2C would hold results that patients are more likely to find distressing (such as risks for Alzheimer’s Disease or Huntington’s Disease). Bin 3 would contain results with unknown clinical implications, and would thus hold the majority of incidental findings. The authors argue that such a binning system, when used appropriately, would lead to relatively few results falling into the “Clinical Utility” bin (Bin 1), and would thus allow patients and providers to focus on those results that are most likely to be useful.

Our study supports the assessment that such an approach to prioritizing results will be needed if incidental findings are to be incorporated into clinical care. However, it also highlights that the “devil is in the details.” The valid results generated by the VeraCode ADME core panel could range from 12 associations meeting very strict criteria (replicated findings with at least one study showing a “strong” correlation, and no negative findings) to 105 associations meeting less stringent criteria (findings demonstrated in at least positive study, and no negative findings). Criteria to identify which of those are also clinically useful should be balanced to ensure that the number of results in Bin 1 remains manageable.

Our work also indicates that national and international collaborations such as the HuGE network and the EGAPP initiative will be important in addressing the daunting task of identifying the proper bin for specific findings. Currently available online databases, although useful for a range of applications, do not provide the information needed for clinical applications. For example, the Genopedia interface for the HuGE Navigator (used to generate the initial dataset for our study) greatly overestimates the relevant genotype-phenotype associations. It catalogs the gene names and MeSH terms that are referenced together in publications,25 but it has not been curated to identify articles that demonstrate clinically relevant associations. On the other hand, the NHGRI Catalog of Genome-Wide Association (GWA) Studies underestimates the number of valid genotype-phenotype associations, since it catalogs only studies utilizing genome-wide methodologies.26,27 For example, of the 287 strong genotype-phenotype associations we identified in this study, only 6 were found in the NHGRI GWA catalog (data not shown).

Disparities in the Incidentalome

This study also demonstrates that the racial and ethnic disparities in genomic science observed in GWA studies are recapitulated among case-control studies.2830 We identified 45 phenotypes whose risk could be assessed at a clinically relevant level among European-American women, but only 6 phenotypes that could be assessed among African-American women. If we accept that genotype-phenotype associations need to be replicated within an ancestral group before they are implemented in medical care for members of that group, then it is clear that a great deal more scientific work will be required before the benefits made possible through genome-based Personalized Medicine can be provided equitably across racial and ethnic groups.

Limitations

The primary limitation of this study is that we did not assess quality of study design, adequacy of sample size, or power of each study. We also did not differentiate between different variants within genes; different variants within genes may carry different risks. In addition, we did not assess the clinical actionability of identified genotype-phenotype associations. Because of these factors, our estimate of the number of genotype-phenotype associations relevant to the medical care of patients is likely an over-estimate. However, the associations that could be eliminated using more strict criteria are likely to be replaced over time by novel associations and new confirmatory findings, both of which are being reported at an increasing rate.

The complex relationships among race, ethnicity, and genetic ancestry also posed significant challenges. The vast majority of studies we reviewed treated place of residence or self-identified race/ethnicity as an analogue for genetic ancestry. While current data supports the generalization that genetic ancestry and self- or observer-reported race/ethnicity are correlated,3134 the appropriate methods for operationalizing genetic ancestry in the clinical application of genetic test findings remain unresolved. In particular, we have speculated that findings generated in Western or Central Europe will be relevant to the health of European Americans while findings generated in West and Sub-Saharan Africa will be relevant to African Americans. This is an assumption that will require more careful analysis prior to clinical application of findings, perhaps on a study-by-study basis.

CONCLUSIONS

As this quantitative literature review has shown, the sheer number of potential incidental findings generated through whole genome sequencing is likely to pose an information management challenge, both for informatics systems and for health care providers and patients. Managing and categorizing all of the genotype-phenotype associations generated through clinical genotyping is likely to overwhelm the resources of individual institutions. Collaborations through national or international networks will be required. Likewise, the amount of time a health care provider would need to address the number of findings generated through such testing for some patients is likely to exceed the practical limitations of most clinic settings. That such a large number of findings can be generated through a relatively small panel of 34 relatively well-studied genes implies that the “incidentalome” generated through whole genome sequencing will raise even more significant challenges. The development of relatively stringent policies for “binning” results is an important pre-requisite for the effective use of incidental genomic findings in clinical care.

Supplementary Material

Fig. S1
Fig. S2
Table S1

ACKNOWLEDGEMENTS

The authors would like to acknowledge the contributions of Dan Roden, Jill Pulley, Jonathan Haines, Dana Crawford, Jonathan Schildcrout, Hua Xu, Jennifer Madison, Erica Bowton, Melissa Basford, Kendrick Newton, and Denise Lillard to this work. These studies were funded by the Vanderbilt Genome-Electronic Records Project, NIH/NHGRI grant 1U01HG006378-01, and the Vanderbilt Institute for Clinical and Translational Research (VICTR), NCATS/NIH grant UL1TR000011.

Footnotes

Conflict of Interest Notification Page

All authors report that they have no commercial association that might pose or create a conflict of interest with information presenting in this manuscript.

References

  • 1.Kohane IS, Masys DR, Altman RB. The Incidentalome. JAMA. 2006;296(2):212–215. doi: 10.1001/jama.296.2.212. [DOI] [PubMed] [Google Scholar]
  • 2.McGuire AL, Burke W. An Unwelcome Side Effect of Direct-to-Consumer Personal Genome Testing. JAMA: the journal of the American Medical Association. 2008;300(22):2669–2671. doi: 10.1001/jama.2008.803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Yarnall KSH, Pollak KI, Østbye T, Krause KM, Michener JL. Primary Care: Is There Enough Time for Prevention? American Journal of Public Health. 2003;93(4):635–641. doi: 10.2105/ajph.93.4.635. 2003/04/01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pulley J, Denny J, Peterson J, et al. Operational implementation of prospective genotyping for personalized medicine: The design of the Vanderbilt PREDICT project. Clinical Pharmacology & Therapeutics. 2012 doi: 10.1038/clpt.2011.371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. [Accessed June 2, 2008];UK Biobank Ethics and Governance Framework. 2007 http://www.ukbiobank.ac.uk/docs/EGF20082.pdf.
  • 6.Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. Journal of the American Medical Informatics Association. 2010;17(1):19. doi: 10.1197/jamia.M3378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap): a metadata-driven methodology and workflow process for providing translational research informatics support. Journal of Biomedical Informatics. 2009;42(2):377–381. doi: 10.1016/j.jbi.2008.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ioannidis JPA, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Replication validity of genetic association studies. Nat Genet. 2001;29(3):306–309. doi: 10.1038/ng749. [DOI] [PubMed] [Google Scholar]
  • 9.Burke W, Psaty BM. Personalized medicine in the era of genomics. JAMA: The Journal of the American Medical Association. 2007;298(14):1682. doi: 10.1001/jama.298.14.1682. [DOI] [PubMed] [Google Scholar]
  • 10.Wright CF, MacArthur DG. Direct-to-Consumer Genetic Testing. In: Best DH, Swensen JJ, editors. Molecular Genetics and Personalized Medicine. New York: Springer; 2012. pp. 215–236. [Google Scholar]
  • 11.Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. A comprehensive review of genetic association studies. Genet Med. 2002;4(2):45–61. doi: 10.1097/00125817-200203000-00002. [DOI] [PubMed] [Google Scholar]
  • 12.Henrikson NB, Burke W, Veenstra DL. Ancillary risk information and pharmacogenetic tests: social and policy implications. Pharmacogenomics J. 2007;8(2):85–89. doi: 10.1038/sj.tpj.6500457. [DOI] [PubMed] [Google Scholar]
  • 13. [Accessed 5/21/2012, 2012];Human Genome Epidemiology Network (HuGENet) 2011 http://www.cdc.gov/genomics/hugenet/default.htm.
  • 14. [Accessed May 21, 2012];Evaluation of Genomic Applications in Practice and Prevention (EGAPP) 2012. 2012 http://www.egappreviews.org/.
  • 15.Khoury MJ. Human Genome Epidemiology (HuGE): translating advances in human genetics into population-based data for medicine and public health. Genet Med. 1999;1:71–73. doi: 10.1097/00125817-199903000-00002. [DOI] [PubMed] [Google Scholar]
  • 16.Halverson CME, Ross LF. Attitudes of African-American parents about biobank participation and return of results for themselves and their children. Journal of medical ethics. 2012 May 9; doi: 10.1136/medethics-2012-100600. 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Murphy J, Scott J, Kaufman D, Geller G, LeRoy L, Hudson K. Public Expectations for Return of Results from Large-Cohort Genetic Research. The American Journal of Bioethics. 2008;8(11):36–43. doi: 10.1080/15265160802513093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ruiz-Canela M, Valle-Mansilla JI, Sulmasy DP. What Research Participants Want to Know About Genetic Research Results: The Impact of “Genetic Exceptionalism”. Journal of Empirical Research on Human Research Ethics: An International Journal. 2011;6(3):39–46. doi: 10.1525/jer.2011.6.3.39. [DOI] [PubMed] [Google Scholar]
  • 19.Bosch JRt, Grody WW. Keeping up with the next generation: massively parallel sequencing in clinical diagnostics. The Journal of Molecular Diagnostics. 2008;10(6):484. doi: 10.2353/jmoldx.2008.080027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Biesecker LG, Shianna KV, Mullikin JC. Exome sequencing: the expert view. Genome Biology. 2011;12(9):128. doi: 10.1186/gb-2011-12-9-128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Marchant GE, Campos-Outcalt DE, Lindor RA. Physician liability: the next big thing for personalized medicine? Personalized Medicine. 2011;8(4):457–467. doi: 10.2217/pme.11.33. 2011/07/01. [DOI] [PubMed] [Google Scholar]
  • 22.Ormond KE, Wheeler MT, Hudgins L, et al. Challenges in the clinical application of whole-genome sequencing. Lancet. 2010;375(9727):1749. doi: 10.1016/S0140-6736(10)60599-5. [DOI] [PubMed] [Google Scholar]
  • 23.Maciosek MV, Coffield AB, Edwards NM, Flottemesch TJ, Goodman MJ, Solberg LI. Priorities Among Effective Clinical Preventive Services: Results of a Systematic Review and Analysis. American Journal of Preventive Medicine. 2006;31(1):52–61. doi: 10.1016/j.amepre.2006.03.012. [DOI] [PubMed] [Google Scholar]
  • 24.Berg JS, Khoury MJ, Evans JP. Deploying whole genome sequencing in clinical practice and public health: meeting the challenge one bin at a time. Genetics in Medicine. 2011;13(6):499. doi: 10.1097/GIM.0b013e318220aaba. [DOI] [PubMed] [Google Scholar]
  • 25. [Accessed June 22, 2011];HuGE Navigator - Genopedia - Search. 2012 http://www.hugenavigator.net/HuGENavigator/startPagePedia.do.
  • 26.Hindorff LA, Sethupathy P, Junkins HA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences. 2009;106(23):9362. doi: 10.1073/pnas.0903103106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hindorff LA, MacArthur J, Wise A, et al. [Accessed September 11, 2012];A Catalog of Published Genome-Wide Association Studies. http://www.genome.gov/gwastudies.
  • 28.Collins FS, Manolio TA. Merging and emerging cohorts: Necessary but not sufficient. Nature. 2007;445(7125):259–259. doi: 10.1038/445259a. [DOI] [PubMed] [Google Scholar]
  • 29.Fullerton SM. The Input-Output Problem: Whose DNA Do We Study, and Why Does It Matter? In: Burke W, editor. Achieving Justice in Genomic Translation: Rethinking the Pathway to Benefit. Oxford; New York: Oxford University Press; 2011. [Google Scholar]
  • 30.Need AC, Goldstein DB. Next generation disparities in human genomics: concerns and remedies. Trends in Genetics. 2009;25(11):489–494. doi: 10.1016/j.tig.2009.09.012. [DOI] [PubMed] [Google Scholar]
  • 31.Dumitrescu L, Ritchie MD, Brown-Gentry K, et al. Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records. Genetics in Medicine. 2010;12(10):648–650. doi: 10.1097/GIM.0b013e3181efe2df. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Burchard EG, Ziv E, Coyle N, et al. The Importance of Race and Ethnic Background in Biomedical Research and Clinical Practice. New England Journal of Medicine. 2003;348(12):1170–1175. doi: 10.1056/NEJMsb025007. [DOI] [PubMed] [Google Scholar]
  • 33.Sinha M, Larkin EK, Elston RC, Redline S. Self-Reported Race and Genetic Admixture. New England Journal of Medicine. 2006;354(4):421–422. doi: 10.1056/NEJMc052515. [DOI] [PubMed] [Google Scholar]
  • 34.Tang H, Quertermous T, Rodriguez B, et al. Genetic Structure, Self-Identified Race/Ethnicity, and Confounding in Case-Control Association Studies. The American Journal of Human Genetics. 2005;76(2):268–275. doi: 10.1086/427888. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Fig. S1
Fig. S2
Table S1

RESOURCES