Skip to main content
. 2016 Oct 25;3:160096. doi: 10.1038/sdata.2016.96

Figure 1. Methods overview for creating the unified NHANES dataset.

Figure 1

(a) Each SAS-formatted (.xpt) data file provided by the CDC/NHANES are binned by ‘module’ (represented by folders), including Demographics (4 files), Laboratory (163 files), Examination (19 files), and Questionnaire (69 files). Participant identifiers to merge data files across modules are depicted as gray colums. (b) File number breakdown by survey year and module. (c) We processed the data to create new variables, added pharmaceutical drug information, and added mortality information. (d) We merged all 255 files by the patient identifier to create a large unified table (‘MainTable’) consisting of 41 K participants and 1191 unique variables. (e) We created a data dictionary that contains human readable variable descriptions and other meta-data, such as variable category and the levels of the variable if categorical. (f) Data is accessible via DataDryad and browsable through the PIC-SURE website (https://nhanes.hms.harvard.edu). Data and a Usage Guide is available on GitHub. Rstudio analytics environment with dataset, xwas R library, and user guides packaged as a Docker hub container (chiragjp/nhanes_scidata).