The PhenX Toolkit: Get the Most From Your Measures

Carol M Hamilton; Lisa C Strader; Joseph G Pratt; Deborah Maiese; Tabitha Hendershot; Richard K Kwok; Jane A Hammond; Wayne Huggins; Dean Jackman; Huaqin Pan; Destiney S Nettles; Terri H Beaty; Lindsay A Farrer; Peter Kraft; Mary L Marazita; Jose M Ordovas; Carlos N Pato; Margaret R Spitz; Diane Wagener; Michelle Williams; Heather A Junkins; William R Harlan; Erin M Ramos; Jonathan Haines

doi:10.1093/aje/kwr193

. 2011 Jul 11;174(3):253–260. doi: 10.1093/aje/kwr193

The PhenX Toolkit: Get the Most From Your Measures

Carol M Hamilton ^*, Lisa C Strader, Joseph G Pratt, Deborah Maiese, Tabitha Hendershot, Richard K Kwok, Jane A Hammond, Wayne Huggins, Dean Jackman, Huaqin Pan, Destiney S Nettles, Terri H Beaty, Lindsay A Farrer, Peter Kraft, Mary L Marazita, Jose M Ordovas, Carlos N Pato, Margaret R Spitz, Diane Wagener, Michelle Williams, Heather A Junkins, William R Harlan, Erin M Ramos, Jonathan Haines

PMCID: PMC3141081 PMID: 21749974

Abstract

The potential for genome-wide association studies to relate phenotypes to specific genetic variation is greatly increased when data can be combined or compared across multiple studies. To facilitate replication and validation across studies, RTI International (Research Triangle Park, North Carolina) and the National Human Genome Research Institute (Bethesda, Maryland) are collaborating on the consensus measures for Phenotypes and eXposures (PhenX) project. The goal of PhenX is to identify 15 high-priority, well-established, and broadly applicable measures for each of 21 research domains. PhenX measures are selected by working groups of domain experts using a consensus process that includes input from the scientific community. The selected measures are then made freely available to the scientific community via the PhenX Toolkit. Thus, the PhenX Toolkit provides the research community with a core set of high-quality, well-established, low-burden measures intended for use in large-scale genomic studies. PhenX measures will have the most impact when included at the experimental design stage. The PhenX Toolkit also includes links to standards and resources in an effort to facilitate data harmonization to legacy data. Broad acceptance and use of PhenX measures will promote cross-study comparisons to increase statistical power for identifying and replicating variants associated with complex diseases and with gene-gene and gene-environment interactions.

Keywords: environmental exposure, epidemiologic methods, genetic research, genetics, genome-wide association study, meta-analysis as topic, phenotype, research design

Genetics and the etiology of common complex diseases

The incorporation of genomics data into population-based studies has led to the emergence of genome-wide association studies and a revolution in the way that scientists think about genetics and the etiology of common, complex diseases (1). Because of the rapid progress in genomic technology, investigators can now analyze hundreds of thousands of genetic polymorphisms (2, 3) against an array of disease phenotypes to identify associations. Genome-wide association studies have the potential to complement research focused on biochemical pathways and/or regulatory cascades and thus inspire new hypotheses (4). Increased understanding of disease etiology and mechanisms will facilitate development of interventions, such as novel prophylactic or therapeutic agents.

Although recent reports from genome-wide association studies have identified a large number of associations between chromosomal loci and complex human diseases (5), to date, most of these studies have had few measures in common (6–8). It is important to compare findings across studies to validate results and to detect relatively weak statistical associations that are commonly found when multiple genetic polymorphisms make small contributions to common disorders. Moreover, there are environmental exposures that can have important ramifications. These include the effects of environmental factors, including ambient environment, personal behaviors, and treatments that can influence susceptibility, presentation, and progression of disease. Several groups of investigators have successfully expanded study populations by incorporating extracted metadata from complementary studies (9, 10). For some diseases, such as diabetes and Crohn’s disease, pooling of multiple genome-wide association studies by meta-analysis has led to the discovery of new gene associations (11–13). However, standard measures could greatly simplify the task of combining studies and validating findings. Over time, the use of standard measures should make it possible to build larger populations for cross-study analysis, thus providing increased statistical power and the ability to detect moderate associations and gene-gene and gene-environment interactions.

The consensus measures for Phenotypes and eXposures (PhenX) Toolkit is designed to provide a core set of well-established, low-burden, high-quality measures for use in large-scale genomic studies. In this report, we describe the rationale and development of the PhenX Toolkit and highlight collaborations and harmonization efforts.

The case for standard measures

There are compelling reasons to promote the use of standard (common) measures for genome-wide association studies and other large-scale genomic research efforts.

Many common complex diseases share similar underlying risk factors, such as smoking and dietary intake (14), and seemingly disparate studies may collect this information using different methods. Standard phenotypic and environmental exposure measures would facilitate combining data from a wide range of studies.
Initial findings from genome-wide association studies need to be replicated in follow-on studies (15). Standard measures would allow for direct comparisons of data collected in studies of different populations to validate initial findings.
Increased statistical power could be achieved by combining studies, and this would facilitate the identification and verification of relatively weak associations and of more complex relations, such as gene-gene and gene-environment interactions (16).
A readily accessible source of standard measures and associated protocols will help investigators expand studies to include measures that are outside of their primary research focus.

RESULTS

PhenX—the project

The PhenX project is led by RTI International (Research Triangle Park, North Carolina) and is funded by the National Human Genome Research Institute (Bethesda, Maryland) at the National Institutes of Health (NIH). The goal of PhenX is to identify and catalog 15 high-quality, low-burden, well-established measures and accompanying standard protocols for each of 21 research domains. The PhenX measures are available to the scientific community via a Web-based toolkit (https://www.phenxtoolkit.org/).

PhenX—the process

The PhenX Steering Committee is composed of 12 scientists with a range of expertise in epidemiology, biostatistics, and genomics research who provide guidance throughout the project. The Steering Committee originally selected and defined 20 research domains that are the focus of the project (Table 1). Through collaboration with the Office of Behavioral and Social Science Research at the NIH, an additional Social Environments domain was added to the PhenX project. A PhenX domain is a field of research with a unifying theme and easily enumerated quantitative and qualitative measures. Domains include Demographics, Anthropometrics, organ systems (e.g., Neurology, Gastrointestinal), complex diseases (e.g., Cancer, Cardiovascular), and lifestyle factors (e.g., Alcohol, Tobacco and Other Substances; Physical Activity and Physical Fitness). Liaisons from the NIH institutes and centers participate in PhenX activities, including nominating Steering Committee and working group members, and are invited to participate in all Steering Committee and working group meetings. The liaisons exchange relevant information with their institutes and centers, help ensure that PhenX is coordinated with related NIH initiatives, and provide additional content expertise.

Table 1.

Research domains delineated in the PhenX Project^a

Alcohol, Tobacco and Other Substances

Anthropometrics

Cancer

Cardiovascular

Demographics

Diabetes

Environmental Exposures

Gastrointestinal

Infectious Diseases and Immunity

Neurology

Nutrition and Dietary Supplements

Ocular

Oral Health

Physical Activity and Physical Fitness

Psychiatric

Psychosocial

Reproductive Health

Respiratory

Skin, Bone, Muscle and Joint

Social Environments

Speech and Hearing

Open in a new tab

Abbreviation: PhenX, consensus measures for Phenotypes and eXposures.

Developed by RTI International (Research Triangle Park, North Carolina) and the National Human Genome Research Institute (Bethesda, Maryland).

To address each PhenX research domain, a working group of domain experts is assembled. The working groups are composed of 6–9 domain experts from academic and government institutions. Working group members are carefully selected to include a balance of members with domain expertise and experience in epidemiology and genomics research. Each working group member commits to participating in the consensus process, which typically takes 7–9 months and includes 1 in-person meeting and 4–6 conference calls. The working group chairs play a key role, leading the working group throughout the consensus-based process. Working group participants are recognized on the project portal (https://www.phenx.org/). The working groups convene and use a consensus-based process to select 15 measures to be included in the PhenX Toolkit. Limiting the number of measures to 15 per domain ensures that the Toolkit includes only the highest-priority, well-established measures and also keeps the Toolkit a manageable size.

The PhenX Toolkit is designed primarily for investigators who wish to expand their disease-specific studies into other areas and who are unlikely to have sufficient resources to add more than a few measures from additional research domains that are outside their primary focus. The Toolkit provides a variety of measures; it is up to investigators to decide which PhenX measures (and how many) they want to incorporate into their overall study design. The overall process of selecting PhenX measures is outlined in Figure 1.

Figure 1. — Consensus process used in the consensus measures for Phenotypes and eXposures (PhenX) project. SC, Steering Committee; WG, Working Group.

A major concern in any genomics-based study is ensuring accurate assessment of the phenotypes and exposures of interest. If the data used for the analyses do not reliably and accurately reflect the phenotypes or exposures, then the associations will not be valid. It is expected that investigators will almost certainly use multiple, more detailed, and potentially higher burden measures to assess their primary research interest but will use PhenX measures to expand their study to include measures from other research domains.

The Steering Committee developed the following criteria to guide the working groups in their deliberations:

Working groups should select well-established measures. Well-established measures may have been used over time and/or come from highly regarded sources. The purpose of PhenX is not to develop new measures but to recommend measures that have been used successfully.
The working group should select measures and accompanying protocols for acquiring the measures that can be used by scientists who do not have expertise in that particular domain.
Measures are assessed with regard to burden for both study participants and investigators. The working groups are advised to select measures of relatively low burden to both investigators and participants. The working groups are asked to consider the time required to administer the protocol, the equipment and skills required, and the overall cost and complexity of data collection and analysis. A limited number of higher-burden measures may be included.
Working groups should select measures that are expected to be relevant for at least the next few years.
Working groups are encouraged to select protocols that are broadly applicable to a variety of populations. A single PhenX measure may have multiple context-dependent protocols to accommodate life-stage and gender variations. The working groups recognize and discuss the potential for cultural differences’ affecting the measures and take that into consideration when selecting specific protocols.
Although the primary focus of PhenX is research in US populations, international measures and standards may also be considered. Scientists from other countries are encouraged to use the PhenX Toolkit.

The working groups review and discuss many measures pertinent to their respective domains and select preliminary measures (up to 25) for outreach to the broader scientific community (Figure 1). This outreach effort seeks to engage additional experts from the scientific community to review and comment on these preliminary measures. The working groups then consider this input in their final deliberations. Deciding on the measures to be included in the PhenX Toolkit is a difficult task, and each working group has to balance the criteria for selection put forth by the Steering Committee. If a measure is highly burdensome or is too cutting-edge to be suitable for the Toolkit but the working group thinks it is highly relevant to the research domain, then the working group may decide to include it in the Supplemental Information section of the Toolkit. The supplemental information may include gold-standard, high-burden measures and/or preliminary measures that were ultimately not selected for inclusion in the Toolkit. Other information that the working group agrees may be of value to the user may also be included. Thus, the supplemental information gives the PhenX Toolkit user additional access to the expertise and guidance of the working groups.

PhenX—the Toolkit

The PhenX Toolkit presents the measures and protocols selected by the working groups (https://www.phenxtoolkit.org/). Users can search or browse the Toolkit, selecting measures of interest by adding them to a cart. From the cart, the user can request reports that provide the information needed to collect data on the measures. The Toolkit provides a description of the measure, detailed protocols associated with the measure, and other related information: for example, rationale for selection, equipment and training required, and references. The Steering Committee envisioned the first few domains as building blocks for the entire PhenX Toolkit. Thus, Demographics, Anthropometrics, and Alcohol, Tobacco and Other Substances were selected as the first 3 domains to be addressed by working groups. With this approach, subsequent working groups, such as the Cancer or Diabetes working group, can review measures already in the Toolkit and then decide whether they are sufficient for their research domain. Setup of the initial 21 domains was completed near the end of 2010, and the selected measures are all available in the PhenX Toolkit.

Because the PhenX Toolkit includes detailed protocols for obtaining data on the measures, Toolkit users can review and assess whether or not a specific protocol is suitable for their study. The expectation is that researchers who visit the Toolkit site will be able to identify some PhenX measures that are suitable for their study population and their available resources. Figure 2 presents a screen shot of the home page of the PhenX Toolkit; a general summary of the PhenX Toolkit is shown in Table 2 (17).

Figure 2. — Home page of the consensus measures for Phenotypes and eXposures (PhenX) Toolkit.

Table 2.

Defining the PhenX Toolkit^a

What the PhenX Toolkit Is	What the PhenX Toolkit Is Not
A catalog of recommended measures for inclusion in new studies or when expanding existing studies A database that allows researchers to browse, search, and select measures Cross-referenced to Cancer Biomedical Informatics Grid common data elements Freely available to the scientific community	Not a new set of standards Not a new ontology of phenotypes Not a data repository Not a biobank Not restrictive Not a proprietary resource

Open in a new tab

Abbreviation: PhenX, consensus measures for Phenotypes and eXposures.

Developed by RTI International (Research Triangle Park, North Carolina) and the National Human Genome Research Institute (Bethesda, Maryland).

Visitors to the PhenX Toolkit site can browse by research domain or search using keywords. PhenX Toolkit users can select measures and save them in a cart. Users can easily add to or remove measures from their cart as they decide which PhenX measures would be most helpful for their study. The PhenX Toolkit provides a brief description of each measure, its purpose and rationale for inclusion, standardized protocols for collecting data on the measure, supporting information, and references. The PhenX Toolkit describes the requirements for each measure, including details about the personnel and equipment needed to collect data on the measures. Users can request a report that provides the details of their selections, thus facilitating incorporation of these measures into the study design. In addition, the Toolkit alerts users if additional measures (essential data) are needed to interpret a selected measure. For example, if Toolkit users select “blood pressure,” the users are prompted to also add “current age,” “gender,” “race,” and “ethnicity” to their cart (i.e., a specific collection of measures). After following a simple registration process, registered Toolkit users can save multiple carts and can share their carts with other registered users via a Toolkit network. This allows investigators who are planning different studies (or expanding an existing study) to work together to include a common set of PhenX measures for future analyses. A data collection form that will help investigators collect the data associated with PhenX measures is currently in development. The data collection form will also make it easy for investigators to integrate PhenX measures into their primary study design.

DISCUSSION

The use of standard measures for analysis

Once an individual has been genotyped, that genotype can potentially be related to any trait, not just the primary phenotype in the original study (14). Because many of the target (primary) phenotypes of research studies are complex conditions or disorders, data are commonly collected on multiple risk factors and comorbid conditions. This opens the door to cross-study analyses of not only the primary phenotypes but also secondary phenotypes (18–21).

Although reports have clearly demonstrated the value of integrating data across related studies and even across disciplines (22), most genome-wide association studies to date have focused on a specific disease or trait. The PhenX Toolkit is designed to aid investigators who are interested in expanding their study to include measures that are outside of their primary area of expertise. For example, an investigator who is planning a neurology study may choose PhenX measures in the Nutrition and Dietary Supplements, Cancer, and Respiratory domains in addition to PhenX Neurology measures. It is also worth noting that some conditions or diseases may be associated with the same phenotype, such as obesity with cardiovascular disease and diabetes. Perhaps even more important, expanding genomics-based studies to include phenotypes outside of the primary research interest is essential to understanding pleiotropic genetic effects (23). Thus, as investigators extend their studies to incorporate PhenX measures, new relations between seemingly unrelated disciplines are likely to be uncovered. Figure 3 illustrates the incorporation of PhenX measures into individual studies and the resulting ability to combine data from multiple studies.

Figure 3. — Benefits of using measures from the consensus measures for Phenotypes and eXposures (PhenX) Toolkit. CVD, cardiovascular disease.

To achieve data interoperability, the adoption of standard data formats and vocabularies is essential (24). The incorporation of PhenX measures into individual studies at the experimental design stage and/or prior to collecting the data will make it possible to easily combine data from multiple, largely unrelated studies. Combining studies generates increased statistical power and the ability to detect both more subtle and more complex—and, perhaps, unexpected—gene associations.

The PhenX Toolkit

Limitations.

The PhenX Toolkit is designed to help investigators effectively expand their studies, but there are limitations. Each working group is asked to balance multiple criteria for selecting measures (defined by the Steering Committee) as they decide what measures to include in the Toolkit. Current limitations are: 1) the Toolkit does not necessarily include the gold standard for each research domain, as these measures are often quite burdensome to administer; 2) promising but relatively new measures are not included in the Toolkit because they are not yet well established; and 3) established protocols are not modified (although some working groups indicated that this could be beneficial).

Collaborations and harmonization.

The PhenX investigators are currently collaborating with administrators of the database of Genotypes and Phenotypes (dbGaP) (8) (http://www.ncbi.nlm.nih.gov/gap/), the Public Population Project in Genomics (P³G) (25) (http://www.p3g.org/), the Data Schema and Harmonization Platform for Epidemiological Research (DataSHaPER) (26) (http://www.datashaper.org/), and the National Library of Medicine (http://www.nlm.nih.gov/). This work is focused on developing a consistent rule set for mapping PhenX measures to dbGaP study variables and DataSHaPER measures and variables. The plan is to highlight PhenX measures in dbGaP and DataSHaPER. The value of this approach is that investigators who visit the dbGaP or DataSHaPER site will be able to readily identify PhenX measures in these resources, thus facilitating data-sharing and data harmonization. In addition, researchers may be able to identify opportunities to extend studies to include data and samples associated with P³G biorepositories. The PhenX investigators are working collaboratively with the National Library of Medicine to ensure that PhenX is aligned with NIH bioinformatics efforts such as Logical Observation Identifiers Names and Codes (27, 28). They are also collaborating with the Electronic Medical Records and Genomics consortium (https://www.mc.vanderbilt.edu/) to facilitate sharing of data captured in electronic medical records.

The use of PhenX measures will facilitate downstream harmonization and meta-analysis. The compelling need to combine studies and to take advantage of legacy data has led to efforts to harmonize similar data elements. Harmonization efforts such as P³G, DataSHaPER, and the Gene Environment Association Studies (GENEVA) consortium are currently under way. DataSHaPER is focused on developing tools for retrospective data harmonization (26). The GENEVA consortium has established a unified framework for genotyping, data quality control, analysis, and interpretation (14). Harmonization methods that make it possible to compare or combine related data types for meta-analysis have proven to be very effective (29) and will always be an option.

Bioinformatics.

As a result of supplemental funding provided under the American Recovery and Reinvestment Act of 2009 (Public Law 111–5), PhenX is extending the Toolkit’s browse and search capabilities to better reflect the interrelatedness of measures across the research domains and collected statistics from Toolkit users (such as “top 10” measures). Measures are currently organized into various groups or collections to allow investigators to browse the Toolkit from a variety of perspectives. For example, in addition to being able to browse measures by “research domain,” users may identify measures of interest by browsing collections of measures such as “risk factors” or “life stages.” This approach could be extended to help Toolkit users assess complex diseases and conditions. For example, investigators could come to the Toolkit and find measures associated with Sjögren’s syndrome or metabolic syndrome even though the measures may have been selected by several different working groups. The Smart Query tool helps Toolkit users find measures using keywords or concepts and traverses the entire Toolkit to provide relevant measures for consideration. A data collection form and a data dictionary are being developed for the Toolkit that will make it easier for investigators to collect and analyze the data associated with PhenX measures. Also in development is a comprehensive bioinformatics mapping document that will link PhenX measures to various resources and standards.

Future directions.

We are developing a strategy to raise the visibility of the Toolkit and promote its use by epidemiologists and other investigators. Based on Toolkit user feedback, we expect to continue to update the functionality of the Toolkit. We plan to establish a process for updating Toolkit content. As complementary and related research efforts mature, some of these measures may be incorporated into the PhenX Toolkit. For example, the Patient-Reported Outcomes Measurement System (http://www.nihpromis.org/) is developing new instruments for effectively capturing patient-provided information, and the NIH Toolbox (http://www.nihtoolbox.org/) is focused on developing new protocols for neurologic and behavioral assessments.

We also envision that the results of our current collaborative efforts will facilitate the mapping and highlighting of PhenX measures in additional data repositories and resources. The PhenX team will continue to welcome additional opportunities to collaborate.

Summary

The PhenX Toolkit provides the research community with a core set of high-quality, well-established, low-burden measures intended for use in genome-wide association studies and other population-based studies. More specifically, the PhenX Toolkit will make it easy for researchers to effectively expand a study to include standard measures outside of their primary research focus. Broad acceptance and use of PhenX measures will promote cross-study comparisons to increase statistical power for identifying and replicating variants associated with complex diseases and with gene-gene and gene-environment interactions. The hope is that the PhenX Toolkit will be widely adopted by the scientific community, fostering a new era of cooperation and collaboration and facilitating cross-study, transdisciplinary, and translational research.

Acknowledgments

Author affiliations: RTI International, Research Triangle Park, North Carolina (Carol M. Hamilton, Lisa C. Strader, Joseph G. Pratt, Deborah Maiese, Tabitha Hendershot, Jane A. Hammond, Wayne Huggins, Dean Jackman, Huaqin Pan, Destiney S. Nettles); National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland (Heather A. Junkins, Erin M. Ramos); Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland (Terri H. Beaty); Departments of Medicine, Neurology, Ophthalmology, and Genetics and Genomics, School of Medicine, Boston University, Boston, Massachusetts (Lindsay A. Farrer); Departments of Epidemiology and Biostatistics, School of Public Health, Boston University, Boston, Massachusetts (Lindsay A. Farrer); Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts (Peter Kraft); Center for Craniofacial and Dental Genetics, Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania (Mary L. Marazita); Human Nutrition Research Center on Aging, Tufts University, Boston, Massachusetts (Jose M. Ordovas); Zilkha Neurogenetic Institute, University of Southern California, Altadena, California (Carlos N. Pato); M. D. Anderson Cancer Center, University of Texas, Houston, Texas (Margaret R. Spitz); RTI International, San Diego, California (Diane Wagener); Department of Epidemiology, School of Public Health and Community Medicine, University of Washington, Seattle, Washington (Michelle Williams); Office of the Director, National Institutes of Health, Chevy Chase, Maryland (William R. Harlan (retired)); Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee (Jonathan Haines); and Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina (Richard K. Kwok, Destiney S. Nettles).

This work was supported by the National Human Genome Research Institute (award U01 HG004597-01).

Guidance is provided to the PhenX project by the PhenX Steering Committee: Jonathan Haines (Chair), William R. Harlan (Vice-Chair), Terri H. Beaty, Lindsay A. Farrer, Peter Kraft, Mary L. Marazita, Jose M. Ordovas, Carlos N. Pato, Erin Ramos, Margaret R. Spitz, Diane Wagener, and Michelle Williams. The PhenX working groups have made key contributions to this project. In particular, the authors acknowledge the expertise and significant contributions of the PhenX working group chairs (to date): Deborah Hasin (Alcohol, Tobacco and Other Substances); Michele Forman (Anthropometrics); Co-Chairs Christine B. Ambrosone and Neil Caporaso (Cancer); Tom Pearson (Cardiovascular); Craig Hanis (Diabetes); Myles Cockburn (Demographics); Lynn Goldman (Environmental Exposures); Jeffery Vance (Neurology); Patrick J. Stover (Nutrition and Dietary Supplements); Co-Chairs James Beck and Bryan Michalowicz (Oral Health); Janey Wiggs (Ocular); Co-Chairs Bill Haskell and Rick Troiano (Physical Activity and Physical Fitness); Co-Chairs Kenneth S. Kendler and Jordan Smoller (Psychiatric); Carol Hogue (Reproductive Health); and Edwin K. Silverman (Respiratory).

The authors thank Dr. Teri Manolio and Dr. Kimberly Tryka for critical review of the manuscript, Michal Zmuda for help with figure design, and August Gering and Laura Small for editorial review.

Conflict of interest: none declared.

Glossary

Abbreviations

DataSHaPER: Data Schema and Harmonization Platform for Epidemiological Research
dbGaP: database of Genotypes and Phenotypes
GENEVA: Gene Environment Association Studies
NIH: National Institutes of Health
PhenX: consensus measures for Phenotypes and eXposures
P³G: Public Population Project in Genomics

References

1.Pennisi E. Breakthrough of the year: human genetic variation. Science. 2007;318(5858):1842–1843. doi: 10.1126/science.318.5858.1842. [DOI] [PubMed] [Google Scholar]
2.International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437(7063):1299–1320. doi: 10.1038/nature04226. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest. 2008;118(5):1590–1605. doi: 10.1172/JCI34772. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Pearson TA, Manolio TA. How to interpret a genome-wide association study. JAMA. 2008;299(11):1335–1344. doi: 10.1001/jama.299.11.1335. [DOI] [PubMed] [Google Scholar]
5.Hindorff LA, Sethupathy P, Junkins HA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106(23):9362–9367. doi: 10.1073/pnas.0903103106. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Thorisson GA, Muilu J, Brookes AJ. Genotype-phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet. 2009;10(1):9–18. doi: 10.1038/nrg2483. [DOI] [PubMed] [Google Scholar]
7.Khoury MJ, Bertram L, Boffetta P, et al. Genome-wide association studies, field synopses, and the development of the knowledge base on genetic variation and human diseases. Am J Epidemiol. 2009;170(3):269–279. doi: 10.1093/aje/kwp119. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Mailman MD, Feolo M, Jin Y, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007;39(10):1181–1186. doi: 10.1038/ng1007-1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Zeggini E, Scott LJ, Saxena R, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008;40(5):638–645. doi: 10.1038/ng.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Risch N, Herrell R, Lehner T, et al. Interaction between the serotonin transporter gene (5-HTTLPR), stressful life events, and risk of depression: a meta-analysis. JAMA. 2009;301(23):2462–2471. doi: 10.1001/jama.2009.878. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Cooper JD, Smyth DJ, Smiles AM, et al. Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nat Genet. 2008;40(12):1399–1401. doi: 10.1038/ng.249. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Barrett JC, Hansoul S, Nicolae DL, et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn’s disease. Nat Genet. 2008;40(8):955–962. doi: 10.1038/NG.175. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Barrett JC, Clayton DG, Concannon P, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Type 1 Diabetes Genetics Consortium. Nat Genet. 2009;41(6):703–707. doi: 10.1038/ng.381. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Manolio TA. Collaborative genome-wide association studies of diverse diseases: programs of the NHGRI’s office of population genomics. Pharmacogenomics. 2009;10(2):235–241. doi: 10.2217/14622416.10.2.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Hunter DJ, Kraft P. Drinking from the fire hose—statistical issues in genomewide association studies. N Engl J Med. 2007;357(5):436–439. doi: 10.1056/NEJMp078120. [DOI] [PubMed] [Google Scholar]
16.Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6(4):287–298. doi: 10.1038/nrg1578. [DOI] [PubMed] [Google Scholar]
17.Cimino JJ, Hayamizu TF, Bodenreider O, et al. The caBIG terminology review process. J Biomed Inform. 2009;42(3):571–580. doi: 10.1016/j.jbi.2008.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Goldstein DB. Common genetic variation and human traits. N Engl J Med. 2009;360(17):1696–1698. doi: 10.1056/NEJMp0806284. [DOI] [PubMed] [Google Scholar]
19.Hirschhorn JN. Genomewide association studies—illuminating biologic pathways. N Engl J Med. 2009;360(17):1699–1701. doi: 10.1056/NEJMp0808934. [DOI] [PubMed] [Google Scholar]
20.Kraft P, Hunter DJ. Genetic risk prediction—are we there yet? N Engl J Med. 2009;360(17):1701–1703. doi: 10.1056/NEJMp0810107. [DOI] [PubMed] [Google Scholar]
21.Emilsson V, Thorleifsson G, Zhang B, et al. Genetics of gene expression and its effect on disease. Nature. 2008;452(7186):423–428. doi: 10.1038/nature06758. [DOI] [PubMed] [Google Scholar]
22.Yu W, Gwinn M, Clyne M, et al. A navigator for human genome epidemiology. Nat Genet. 2008;40(2):124–125. doi: 10.1038/ng0208-124. [DOI] [PubMed] [Google Scholar]
23.Stover PJ, Harlan WR, Hammond JA, et al. PhenX: an interdisciplinary toolkit for genetics and epidemiology. Curr Opin Lipidol. 2010;21(2):136–140. doi: 10.1097/MOL.0b013e3283377395. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Schad P, Mobley L, Hamilton CM. Building a biomedical cyberinfrastructure for collaborative research. Am J Prev Med. 2011;40(5 suppl. 2):S144–S150. doi: 10.1016/j.amepre.2011.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Knoppers BM, Fortier I, Legault D, et al. The Public Population Project in Genomics (P3G): a proof of concept? Eur J Hum Genet. 2008;16(6):664–665. doi: 10.1038/ejhg.2008.55. [DOI] [PubMed] [Google Scholar]
26.Fortier I, Burton PR, Robson PJ, et al. Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies. Int J Epidemiol. 2010;39(5):1383–1393. doi: 10.1093/ije/dyq139. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Forrey AW, McDonald CJ, DeMoor G, et al. Logical Observation Identifier Names and Codes (LOINC) database: a public use set of codes and names for electronic reporting of clinical laboratory test results. Clin Chem. 1996;42(1):81–90. [PubMed] [Google Scholar]
28.McDonald CJ, Huff SM, Suico JG, et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin Chem. 2003;49(4):624–633. doi: 10.1373/49.4.624. [DOI] [PubMed] [Google Scholar]
29.Voight BF, Scott LJ, Steinthorsdottir V, et al. Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nat Genet. 2010;42(7):579–589. doi: 10.1038/ng.609. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib1] 1.Pennisi E. Breakthrough of the year: human genetic variation. Science. 2007;318(5858):1842–1843. doi: 10.1126/science.318.5858.1842. [DOI] [PubMed] [Google Scholar]

[bib2] 2.International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437(7063):1299–1320. doi: 10.1038/nature04226. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest. 2008;118(5):1590–1605. doi: 10.1172/JCI34772. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Pearson TA, Manolio TA. How to interpret a genome-wide association study. JAMA. 2008;299(11):1335–1344. doi: 10.1001/jama.299.11.1335. [DOI] [PubMed] [Google Scholar]

[bib5] 5.Hindorff LA, Sethupathy P, Junkins HA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106(23):9362–9367. doi: 10.1073/pnas.0903103106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Thorisson GA, Muilu J, Brookes AJ. Genotype-phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet. 2009;10(1):9–18. doi: 10.1038/nrg2483. [DOI] [PubMed] [Google Scholar]

[bib7] 7.Khoury MJ, Bertram L, Boffetta P, et al. Genome-wide association studies, field synopses, and the development of the knowledge base on genetic variation and human diseases. Am J Epidemiol. 2009;170(3):269–279. doi: 10.1093/aje/kwp119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Mailman MD, Feolo M, Jin Y, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007;39(10):1181–1186. doi: 10.1038/ng1007-1181. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Zeggini E, Scott LJ, Saxena R, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008;40(5):638–645. doi: 10.1038/ng.120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Risch N, Herrell R, Lehner T, et al. Interaction between the serotonin transporter gene (5-HTTLPR), stressful life events, and risk of depression: a meta-analysis. JAMA. 2009;301(23):2462–2471. doi: 10.1001/jama.2009.878. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Cooper JD, Smyth DJ, Smiles AM, et al. Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nat Genet. 2008;40(12):1399–1401. doi: 10.1038/ng.249. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Barrett JC, Hansoul S, Nicolae DL, et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn’s disease. Nat Genet. 2008;40(8):955–962. doi: 10.1038/NG.175. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Barrett JC, Clayton DG, Concannon P, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Type 1 Diabetes Genetics Consortium. Nat Genet. 2009;41(6):703–707. doi: 10.1038/ng.381. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Manolio TA. Collaborative genome-wide association studies of diverse diseases: programs of the NHGRI’s office of population genomics. Pharmacogenomics. 2009;10(2):235–241. doi: 10.2217/14622416.10.2.235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Hunter DJ, Kraft P. Drinking from the fire hose—statistical issues in genomewide association studies. N Engl J Med. 2007;357(5):436–439. doi: 10.1056/NEJMp078120. [DOI] [PubMed] [Google Scholar]

[bib16] 16.Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6(4):287–298. doi: 10.1038/nrg1578. [DOI] [PubMed] [Google Scholar]

[bib17] 17.Cimino JJ, Hayamizu TF, Bodenreider O, et al. The caBIG terminology review process. J Biomed Inform. 2009;42(3):571–580. doi: 10.1016/j.jbi.2008.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Goldstein DB. Common genetic variation and human traits. N Engl J Med. 2009;360(17):1696–1698. doi: 10.1056/NEJMp0806284. [DOI] [PubMed] [Google Scholar]

[bib19] 19.Hirschhorn JN. Genomewide association studies—illuminating biologic pathways. N Engl J Med. 2009;360(17):1699–1701. doi: 10.1056/NEJMp0808934. [DOI] [PubMed] [Google Scholar]

[bib20] 20.Kraft P, Hunter DJ. Genetic risk prediction—are we there yet? N Engl J Med. 2009;360(17):1701–1703. doi: 10.1056/NEJMp0810107. [DOI] [PubMed] [Google Scholar]

[bib21] 21.Emilsson V, Thorleifsson G, Zhang B, et al. Genetics of gene expression and its effect on disease. Nature. 2008;452(7186):423–428. doi: 10.1038/nature06758. [DOI] [PubMed] [Google Scholar]

[bib22] 22.Yu W, Gwinn M, Clyne M, et al. A navigator for human genome epidemiology. Nat Genet. 2008;40(2):124–125. doi: 10.1038/ng0208-124. [DOI] [PubMed] [Google Scholar]

[bib23] 23.Stover PJ, Harlan WR, Hammond JA, et al. PhenX: an interdisciplinary toolkit for genetics and epidemiology. Curr Opin Lipidol. 2010;21(2):136–140. doi: 10.1097/MOL.0b013e3283377395. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24.Schad P, Mobley L, Hamilton CM. Building a biomedical cyberinfrastructure for collaborative research. Am J Prev Med. 2011;40(5 suppl. 2):S144–S150. doi: 10.1016/j.amepre.2011.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Knoppers BM, Fortier I, Legault D, et al. The Public Population Project in Genomics (P3G): a proof of concept? Eur J Hum Genet. 2008;16(6):664–665. doi: 10.1038/ejhg.2008.55. [DOI] [PubMed] [Google Scholar]

[bib26] 26.Fortier I, Burton PR, Robson PJ, et al. Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies. Int J Epidemiol. 2010;39(5):1383–1393. doi: 10.1093/ije/dyq139. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] 27.Forrey AW, McDonald CJ, DeMoor G, et al. Logical Observation Identifier Names and Codes (LOINC) database: a public use set of codes and names for electronic reporting of clinical laboratory test results. Clin Chem. 1996;42(1):81–90. [PubMed] [Google Scholar]

[bib28] 28.McDonald CJ, Huff SM, Suico JG, et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin Chem. 2003;49(4):624–633. doi: 10.1373/49.4.624. [DOI] [PubMed] [Google Scholar]

[bib29] 29.Voight BF, Scott LJ, Steinthorsdottir V, et al. Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nat Genet. 2010;42(7):579–589. doi: 10.1038/ng.609. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The PhenX Toolkit: Get the Most From Your Measures

Carol M Hamilton

Lisa C Strader

Joseph G Pratt

Deborah Maiese

Tabitha Hendershot

Richard K Kwok

Jane A Hammond

Wayne Huggins

Dean Jackman

Huaqin Pan

Destiney S Nettles

Terri H Beaty

Lindsay A Farrer

Peter Kraft

Mary L Marazita

Jose M Ordovas

Carlos N Pato

Margaret R Spitz

Diane Wagener

Michelle Williams

Heather A Junkins

William R Harlan

Erin M Ramos

Jonathan Haines

Abstract

Genetics and the etiology of common complex diseases

The case for standard measures

RESULTS

PhenX—the project

PhenX—the process

Table 1.

Figure 1.

PhenX—the Toolkit

Figure 2.

Table 2.

DISCUSSION

The use of standard measures for analysis

Figure 3.

The PhenX Toolkit

Limitations.

Collaborations and harmonization.

Bioinformatics.

Future directions.

Summary

Acknowledgments

Glossary

Abbreviations

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases