Abstract
The PhenX (consensus measures for Phenotypes and eXposures) Toolkit (https://www.phenxtoolkit.org/) offers high-quality, well-established measures of phenotypes and exposures for use by the scientific community. The goal is to promote the use of standard measures, enhance data interoperability, and help investigators identify opportunities for collaborative and translational research. The Toolkit contains 395 measures drawn from 22 research domains (fields of research), along with additional collections of measures for Substance Abuse and Addiction (SAA) research, Mental Health Research (MHR), and Tobacco Regulatory Research (TRR). Additional measures for TRR that are expected to be released in 2015 include Obesity, Eating Disorders, and Sickle Cell Disease. Measures are selected by working groups of domain experts using a consensus process that includes input from the scientific community. The Toolkit provides a description of each PhenX measure, the rationale for including it in the Toolkit, protocol(s) for collecting the measure, and supporting documentation. Users can browse measures in the Toolkit or can search the Toolkit using the Smart Query Tool or a full text search. PhenX Toolkit users select measures of interest to add to their Toolkit. Registered Toolkit users can save their Toolkit and return to it later to revise or complete. They then have options to download a customized Data Collection Worksheet that specifies the data to be collected, and a Data Dictionary that describes each variable included in the Data Collection Worksheet. The Toolkit also has a Register Your Study feature that facilitates cross-study collaboration by allowing users to find other investigators using the same PhenX measures.
Keywords: genome-wide association studies (GWAS), PhenX; phenotypes, environmental exposures, epidemiology, research; standard measures
INTRODUCTION
The PhenX Toolkit (consensus measures for Phenotypes and eXposures) is a carefully designed resource driven by the scientific community with oversight by a Steering Committee, the engagement of Working Groups of experts, and feedback from the broader scientific community. This resource makes high-priority measures available for use in genomic, biomedical, and translational research studies (Stover et al., 2010). A PhenX measure refers broadly to a standard way of capturing data on a certain characteristic of, or relating to, a study subject. Measures may include exposures, clinical assessments, and quantitative and qualitative traits. The PhenX Steering Committee established the criteria used to guide the selection of measures for the Toolkit (Table 1.21.1). PhenX Working Groups (WGs) select measures that meet these criteria and recommend them for use by the scientific community. The WGs also select protocols or standard procedures to collect measures. A protocol can be a series of questions to include in a survey instrument, data to abstract from medical records, or a biological specimen.
Table 1.21.1.
Primary criteria |
Clearly defined |
Well established |
Broadly applicable and generally accepted |
Low burden to participants and investigators |
Broadly validated with demonstrated utility |
Reproducible |
Specific |
Reliable |
Standard measurement protocols |
Additional criteria |
Cross-cutting relevance for population groups, diseases, and conditions |
Prior use in GWAS |
Prior use in a major reference study, e.g., National Health and Nutrition Examination Survey (NHANES) |
Open-source software and nonproprietary instruments preferred |
Brevity |
Expectation of acceptance by the research community |
The PhenX Toolkit currently includes 395 measures from 22 different domains or research areas that are related to complex diseases and environmental exposures. There are also collections of 44 measures for Substance Abuse and Addiction (SAA) research (Conway et al., 2014), 19 measures for Mental Health Research (MHR), and 19 measures for Tobacco Regulatory Research (TRR), to be expanded later this year. The National Institute on Drug Abuse (NIDA) and the National Institute of Mental Health (NIMH) both issued National Institutes of Health (NIH) Guide Notices announcing that SAA and MHR measures are available in the Toolkit. Each PhenX measure has a protocol providing a standard procedure used to collect and record a PhenX measure. Investigators may visit the Toolkit to review and select PhenX measures when designing a new study or expanding an ongoing study. By including PhenX Toolkit measures in their research, investigators can ensure that their studies will be compatible with other studies that incorporate the same measures. This was evidenced by the PhenX RISING (Real world, Implementation, SharING) network, which demonstrated collaboration between seven studies that successfully used PhenX measures in ongoing genomic studies (McCarty et al., 2014). Sharing standard measures across studies will facilitate cross-study analysis and increase statistical power to identify genetic associations with complex diseases and traits, gene-gene interactions, and/or gene-environment interactions. PhenX measures can also complement study-specific measures by expanding studies with measures that are outside the expertise of the investigator. Broadening the study design increases cross-study compatibility and, consequently, the overall impact of the study over time.
USING THE PhenX TOOLKIT
Key Concepts
Using the same terminology to describe the different components of PhenX is important so that all PhenX Toolkit users have a common understanding of what the terms mean. Table 1.21.2 provides definitions for some of the key concepts used by PhenX.
Table 1.21.2.
Concept | Definition |
---|---|
Collection | A group of measures with a shared characteristic, target population, or topic. The measures included in a collection may cut across research domains (e.g., cancer risk factors). |
Domain | A field of research with a unifying theme and easily enumerated quantitative and qualitative measures (e.g., demographics, anthropometrics, organ systems, complex diseases, and lifestyle factors) |
Essential data | Variables that are not necessarily specified by the protocol but need to be collected and recorded along with the main phenotypes/variables (e.g., date of exam or blood pressure cuff size) |
Essential measure | Other PhenX Toolkit measures that are needed to interpret the results of the measure of interest (e.g., age, when measuring height or weight) |
Measure | A standardized way of capturing data on a certain characteristic of or relating to a study subject |
Protocol | A standard procedure recommended by a WG for investigators to collect and record a PhenX measure |
Related measure | Other PhenX measures that the user may find helpful based on the selection of a given PhenX measure. Related PhenX measures are suggestions only. |
Specialty Collection | A collection of measures with a shared characteristic, target population, or topic that is related to a specific area of research |
Supplemental Information | Describes the scope of each PhenX domain, including other measures considered (but not selected) by the WG and additional comments from the WG |
Well-established protocol | A protocol that is readily accepted and recognized by researchers within a given area of expertise. This may have been used by many investigators and studies over time, come from a highly regarded source or long-running study (e.g., National Health and Nutrition Examination Survey, Framingham Heart Study), and/or be recommended by experts in the field. |
The PhenX Toolkit comprises measures from 22 broad research domains (Table 1.21.3) with additional depth in some areas. Each domain is defined as a field of research with a unifying theme. For each broad research domain, a WG of expert scientists is assembled to select PhenX measures for the Toolkit, based on criteria developed by the PhenX Steering Committee. Through an iterative consensus process and with input from the broader scientific community (Maiese et al., 2013), each WG selected up to 15 measures for inclusion in the PhenX Toolkit. The measures and relevant protocols in the PhenX Toolkit were chosen by content experts and reviewed by the PhenX Steering Committee.
Table 1.21.3.
Alcohol, tobacco, and other substances |
Anthropometrics |
Cancer |
Cardiovascular |
Demographics |
Diabetes |
Environmental exposures |
Gastrointestinal |
Infectious diseases and immunity |
Neurology |
Nutrition and dietary supplements |
Ocular |
Oral health |
Physical activity and physical fitness |
Psychiatric |
Psychosocial |
Rare genetic conditions |
Reproductive health |
Respiratory |
Skin, bone, muscle, and joint |
Social environment |
Speech and hearing |
The PhenX Toolkit also includes specialty collections of measures that support specific research communities, and expand the depth and breadth of the Toolkit. These measures are intended to promote use of common data elements in the areas of SAA, MHR, and TRR (Table 1.21.4). In order to add depth in these specific research areas, PhenX uses a slightly modified version of the standard PhenX consensus process. In addition, SAA and MHR have Core Collections of measures deemed relevant and essential to these areas of research.
Table 1.21.4.
Collection | Measures |
---|---|
SAA | • Assessment of substance use and substance use disorders • Substance-specific intermediate phenotypes • Substance use-related neurobehavioral and cognitive risk factors • Substance use-related psychosocial risk factors • Substance use-related community factors • Substance use-related co-morbidities and health-related outcomes |
MHR | • Suicide • Post-traumatic stress psychopathology (including PTSD) |
TRR | • Host: social/cognitive • Host: biobehavioral • Agent • Vector • Environment |
The PhenX Toolkit is publicly available for use at no cost to the scientific community in a web-based resource (https://www.phenxtoolkit.org/). It provides a brief description of each measure, the purpose and rationale for the measure’s inclusion, standard protocols, references, and requirements (e.g., training, personnel, equipment, applicable licensing fees) for data or specimen collection (Fig. 1.21.1). Because the measures are selected by experts, investigators can expand a study design beyond their primary research focus (and expertise) and be confident that they are adding recommended measures to their study. Toolkit users can search or browse for measures and protocols, and save selected measures in My Toolkit. Registered users can save multiple Toolkits and share them with other registered users via a Toolkit Network. This enables investigators who are planning a new study or expanding an existing study to work together to include common PhenX measures and to proactively plan for future analyses. Users can also download a custom Data Collection Worksheet (DCW) that details their selections and allows them to easily “cut and paste” protocols to incorporate them into their data collection instruments. The Data Dictionary lists each variable included in the protocol along with its attributes (e.g., variable type, minimum/maximum values).
Strategic Approach
The following step-by-step example illustrates how the PhenX Toolkit is used to identify additional phenotypic and environmental exposure measures to include in a study. In this scenario, an investigator has designed an observational epidemiological study of glaucoma. She plans to collect data for a variety of phenotypic measures, including physical examinations and self-administered questions. The investigator is interested in how genetic variation influences phenotypes associated with glaucoma, and plans to collect biospecimens that could be used for genetic analyses in the future. She also recognizes that environmental factors may influence the risk of glaucoma directly or indirectly through gene-environment interactions.
The investigator has designed the study to include a physical examination and a self-administered questionnaire that collects information on medication use and personal history of diabetes and hypertension. The investigator recognizes that adding PhenX measures to the study will facilitate combining her data with other studies that have also used PhenX measures, and will allow her to more readily replicate findings from other studies. The investigator visits the PhenX Toolkit to identify recommended measures of phenotypes and exposures that could be included in her glaucoma study.
(1). Browse measures identified by the Ocular WG (Fig. 1.21.2).
After reviewing the 15 PhenX measures from the Ocular domain, the investigator recognizes that she has already included several in her study, including “Corrective Lens Use”. She selects “Personal and Family History of Eye Disease and Treatments,” “Personal and Family History of Strabismus,” and “Refractive Error Measurement” for inclusion, because they would add important details to her research. After each selection, the contents of her Toolkit are displayed (Fig. 1.21.3).
(2). View “Users Who Chose This Measure Also Chose” and “Essential Measures” (Fig. 1.21.3, red arrows).
After measures are added to My Toolkit, the Toolkit presents additional measures based on Toolkit use statistics. In the glaucoma study example, “Users Who Chose This Measure Also Chose” displays five other measures that may be of interest to the investigator. The Toolkit also presents “Essential Measures” identified by the domain experts. In this case, current age is essential to interpreting the results of two measures: “Personal and Family History of Eye Disease and Treatments” and “Refractive Error Measurement”. As a result, the investigator is prompted to consider adding “Current Age” to My Toolkit.
(3). View Requirements for selected measures and protocols (Fig. 1.21.3, blue arrow, and Fig. 1.21.4).
The prompt indicates that major equipment and specialized training are required to perform the “Refractive Error Measurement”. This does not create a problem since an in-person visit is already part of the study design.
(4). View Ocular WG roster (Fig. 1.21.5).
The Toolkit provides a roster for each WG that has been convened, allowing investigators to understand the expertise of the group that selected the final measures for the Toolkit.
(5). Register as a Toolkit user (Fig. 1.21.6).
The investigator realizes she needs to save her work and return to this activity at a later time. To save the measures she has selected, she goes to the “Register” button at the top of the page and follows the directions to become a registered Toolkit user. She names her Toolkit “Glaucoma pre-GWAS” (Fig. 1.21.7), saves the measures she has selected, and exits the Toolkit. The next day, she returns to the measures saved in the Glaucoma pre-GWAS Toolkit.
(6). Review Registered Studies (Fig. 1.21.8).
From the PhenX Toolkit homepage, the investigator selects “All Registered Studies.” Investigators who have registered their study have provided information about their study and the PhenX measures they are using. This allows users to find other studies using the same protocols and identify opportunities for cross-study analysis.
(7). Use the Smart Query Tool to search for “fitness” with filters (Fig. 1.21.9).
The investigator initiates a search of the Toolkit using the key word “fitness” because reduced physical activity might be an important risk factor for glaucoma. The Smart Query Tool searches through names, aliases, and keywords. It returns fewer results than the Text Search, but they are more specific. To further hone the results, she filters by selecting “Adult” as the Lifestage and “Self-administered questionnaire” as the data collection mode. This returns six results (Fig. 1.21.10). The investigator adds the measure “Total Physical Activity Screener” to My Toolkit. This measure will assess the recent physical activity of the study participants.
(8). View Supplemental Information (Fig. 1.21.11).
The investigator notices that there is Supplemental Information in the Ocular domain page (Fig. 1.21.2) and views this material. Supplemental Information includes measures that the WG considered but did not include, as well as information that the WG felt was necessary to understand the field of research. Supplemental Information appears at the end of the results from a full Text Search.
(9). Use the Smart Query Tool Smart Search to search for “tobacco exposure” (Fig. 1.21.12).
The investigator is interested in how tobacco use or exposure to tobacco may influence development of glaucoma. From the Smart Search results, she adds the measure “Passive Smoke Exposure” to her Toolkit. A full Text Search of “tobacco exposure” can also be used to review more broadly relevant results and any Supplemental Information.
(10). Use the Tree View to see whether there are additional measures (Fig. 1.21.13).
The Tree View provides another way to look for measures of interest. The investigator can see all domains listed alphabetically, look for measures alphabetically, or identify measures by viewing collections.
(11). Use the Share this Toolkit link to share selected measures with colleagues (Fig. 1.21.14).
The investigator can share the contents of her Toolkit with colleagues who are registered Toolkit users.
(12). Use the Download Report feature to create a full report in HTML format (Fig. 1.21.15).
This downloadable report contains the measures and protocols that the investigator selected for her Toolkit. The list of protocols provides step-by-step instructions on how to collect the data.
(13). Create a Data Collection Worksheet (Fig. 1.21.16).
The Data Collection Worksheet presents the measures and protocols in a format that can be incorporated into an existing data collection instrument. For example, using MS Word, the user can simply copy the questions from the Data Collection Worksheet and paste them into an existing data collection instrument. This greatly simplifies the instrument development process and eliminates the potential for typographical errors.
(14). Create a Data Dictionary in MS Word or .csv format (Figs. 1.21.17 and 1.21.18).
A Data Dictionary provides definitions of variables and other relevant parameters for a study’s data management system. It also provides detailed information about each variable, as well as skip patterns that are implemented throughout. MS Word is a common format for data dictionaries in epidemiological studies (Fig. 1.21.17). Genome-wide association studies (GWAS) sponsored by the National Institutes of Health are required to submit data to the National Center for Biotechnology Information (NCBI) Database of Genotypes and Phenotypes (dbGaP; http://www.ncbi.nlm.nih.gov/gap; for information on policy, see Mailman et al., 2007; http://grants.nih.gov/grants/gwas/). The PhenX Toolkit provides the .csv format of the Data Dictionary (Fig. 1.21.18) to facilitate this submission in compliance with the dbGaP direct data submission packet.
COMMENTARY
Background Information
The PhenX Toolkit is a web-based resource that makes it easy for researchers to add standard measures to their studies. The value of standard measures is widely recognized (Bennett et al., 2011), because they facilitate cross-study analysis and are relevant for GWAS and other biomedical studies. Without standard measures, investigators must infer correspondence between data that are conceptually related but were collected using different methodologies, and attempt to harmonize disparate data across multiple studies. This can be a time-consuming and difficult process that often results in only a relatively small number of common variables that can be used for data analysis.
This unit highlights the features of the PhenX Toolkit that allow researchers to easily identify and select measures to add to their studies, as well as identify potential collaborators also using PhenX measures. PhenX measures and their corresponding protocols are selected by WGs of domain experts The WGs select protocols that are well-established, broadly validated, generally low-burden, and accessible to non-experts. The Toolkit provides users with detailed protocols, alerts them to essential measures (Fig. 1.21.3) and special requirements (Fig. 1.21.4), and provides literature references recommended by the WG. Users can generate and download a Report, Data Collection Worksheet, and Data Dictionary to help them integrate their selected PhenX measures. A comprehensive effort mapping PhenX variables to dbGaP variables is underway that will allow users to identify variables in dbGaP that are identical, comparable, or related to PhenX variables. In addition, PhenX protocols are available as REDCap Instrument zip files to support investigators who want to collect data via the web. Currently, REDCap Instrument zip files are available for 12 domains consisting of 197 PhenX protocols. Taken together, the PhenX Toolkit is intended to fully support a researcher who is designing or expanding a study to include standard measures. This overview is intended to encourage use of the PhenX Toolkit and promote widespread adoption of standard measures and common data elements.
Critical Parameters and Troubleshooting
The PhenX Toolkit is intended to provide support for a variety of study designs. Since not all measures are suitable for all study designs, decisions of which measures to select are the responsibility of the investigator. The Toolkit Guidance (Fig. 1.21.19) provides links to vital information such as a Certificate of Confidentiality, a Code of Ethics for Public Health, the National Human Genome Research Institute (NHGRI) informed consent resource, and references pertaining to study design.
For a given measure, the Toolkit provides additional Essential Measures (Fig. 1.21.3) that are required to correctly interpret selected measures. For example, Current Age, Ethnicity, Gender, and Race are all considered essential to interpreting the measure Arm Span. When a user selects Arm Span, he/she is encouraged to also select these four Essential Measures.
Another helpful Toolkit feature is the Requirements Table (Fig. 1.21.4), which is intended to prompt careful consideration before a measure is added to a study. When a protocol for a selected measure has special requirements such as equipment or training, users are asked to review them in the Requirements Table. In general, PhenX WGs are allowed to select up to two measures with special requirements for inclusion in the Toolkit. This limit is intended to ensure that most Toolkit measures can be readily implemented by most investigators.
The Toolkit also offers a Data Collection Worksheet (Fig. 1.21.16) and Data Dictionary (Figs. 1.21.17 and 1.21.18). The Data Collection Worksheet helps investigators integrate PhenX measures into their existing research and also helps ensure that investigators collect the data needed for their measures. The Data Dictionary describes the variables associated with the selected measures.
The Register Your Study feature of the Toolkit is intended to help users find other studies employing the same PhenX protocols and to identify opportunities for cross-study analysis. Registered Toolkit users are invited to fill out a simple form providing a basic study description and the PhenX protocols used.
Anticipated Results
By including PhenX Toolkit measures in their research, investigators will be able to more easily combine or compare their data with data collected by others who incorporate the same PhenX measures. This will facilitate cross-study analysis and increase the statistical power to identify genetic associations with complex diseases and traits, gene-gene interactions, and gene-environment interactions.
To date, 95 Funding Opportunity Announcements (FOAs) have been issued from the NIH and the Department of Defense (DoD) that encourage the use of PhenX measures. As PhenX measures become an integral part of study design, researchers will be able to combine and compare studies with increasing ease. Studies need to be compared in order to replicate results. In addition, combining studies increases the sample size, thus providing the statistical power needed to detect moderate and more complex genetic associations. Widespread adoption of PhenX measures will have a dramatic impact on biomedical research and ultimately on the health and well-being of the population.
Time Considerations
The time required to collect data on a particular measure is addressed in the Requirements Table (Fig. 1.21.4) for each specific protocol. Time as it pertains to use of the Toolkit is difficult to estimate. It could take as little as an hour for an investigator to select a few measures and request a Report, Data Collection Worksheet, and Data Dictionary. However, it is more likely that an investigator would spend an hour or two becoming familiar with the Toolkit, select a few measures, then return to the Toolkit again later to select additional measures. The PhenX Toolkit should be used as an integral part of the development of a new study.
Acknowledgments
This work was supported by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) Cooperative Agreement U01HG004597 and U41HG007050 with co-funding from the Office of Behavioral and Social Sciences Research (OBSSR) and the National Institute on Drug Abuse (NIDA). Additional funding was provided by the Food and Drug Administration Center for Tobacco Products and the NIH Tobacco Regulatory Science Program (FDA CTP and TRSP, award no. U41HG007050-02S2), the National Institute of Mental Health (NIMH, award no. 3U41HG007050-01S2), and the National Heart, Lung and Blood Institute (NHLBI, award no. 3U01HG004597-02S1).
Literature Cited
- Bennett SN, Caporaso N, Fitzpatrick AL, Agrawal A, Barnes K, Boyd HA, Cornelis MC, Hansel NN, Heis G, Heit JA, Kang JH, Kittner SJ, Kraft P, Lowe W, Marazita ML, Monroe KR, Pasquale LR, Ramos EM, van Dam RM, Udren J, and Williams K; for the GENEVA Consortium. 2011. Phenotype harmonization and cross-study collaboration in GWAS consortia: The GENEVA experience. Genet. Epidemiol 35:159–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conway KP, Vullo GC, Kennedy AP, Finger MS, Arpana A, Bjork JM, Farrer LA, Hancock DB, Hussong A, Wakim P, Huggins W, Hendershot T, Nettles DS, Pratt J, Maiese D, Junkins HA, Ramos EM, Strader LC, Hamilton CM, and Sher KJ 2014. Data compatibility in the addiction sciences: An examination of measure commonality. Drug Alcohol Depend. 141:153–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maiese DR, Hendershot TP, Strader LC, Wagener DK, Hammond JA, Huggins W, Kwok RK, Hancock DB, Whitehead NS, Nettles DS, Pratt JG, Scott MS, Conway KP, Junkins HA, Ramos EM, and Hamilton CM 2013. PhenX—Establishing a Consensus Process to Select Common Measures for Collaborative Research. RTI Press publication no. MR-0027–1310. RTI Press, Research Triangle Park, N.C. [PubMed] [Google Scholar]
- Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J, Phan L, Popova N, Pretel S, Ziyabari L, Lee M, Shao Y, Wang ZY, Sirotkin K, Ward M, Kholodov M, Zbicz K, Beck J, Kimelman M, Shevelev S, Preuss D, Yaschenko E, Graeff A, Ostell J, and Sherry ST 2007. The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet 39:1181–1186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCarty CA, Huggins W, Aiello AE, Bilder RM, Hariri A, Jernigan TL, Newman E, Sanghera DK, Strauman TJ, Zeng Y, Ramos EM, and Junkins HA 2014. PhenX RISING: Real world implementation and sharing of PhenX measures. BMC Med. Genom 7:16 (doi: 10.1186/1755-8794-7-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stover PJ, Harlan WR, Hammond JA, Hendershot T, and Hamilton CM 2010. PhenX: A toolkit for interdisciplinary genetics research. Curr. Opin. Lipidol 21:136–140. [DOI] [PMC free article] [PubMed] [Google Scholar]