Table 1.
Clinical data type | Brief description | Regulatory access restrictions | Applications |
---|---|---|---|
Fully identified clinical data sets | Observational patient data derived from paper‐based or electronic medical records | IRB approval is required; an executed data use agreement is possibly requireda | Clinical interpretation and scientific inference and discovery |
HIPAA‐limited clinical data sets | Observational patient data containing only a limited set of HIPAA‐defined PHI | IRB approval is required; an executed data use agreement is possibly requireda | Clinical interpretation and scientific inference and discovery, but with the understanding that certain data elements have been removed from the data and/or transformed |
Deidentified clinical data sets | Observational patient data, but with all HIPAA‐defined PHI elements removed | IRB approval is not requiredb; IRB “Request for Determination of Human Subjects Research” is typically recommended; an executed data use agreement is possibly required | Clinical interpretation and scientific inference and discovery, but with the understanding that inferences regarding time and potentially other factors cannot be made |
HuSH+ clinical data sets |
Observational patient data, fully compliant with HIPAA Safe Harbor, but unlike deidentified clinical data sets, HuSH+ clinical data sets have been altered such that (i) real patient identifiers (including geocodes) have been replaced with random patient identifiers and (ii) dates (including birth dates) have been shifted by a random number of days (maximum of ± 50 days), with all dates for a given patient shifted by the same number of days Data are derived from UNC Health Care System |
An executed data use agreement is requiredc | Clinical interpretation and scientific inference and discovery, but with the understanding that any inferences based on date/time and location (geocode) cannot be made with precision, and all other inferences must consider date/time and location as potentially hidden covariates |
Clinical profiles |
Statistical profiles of disease and associated phenotypic presentation derived from observational patient data Data are derived from Johns Hopkins Medicine |
IRB approval is required to generate clinical profiles; no other restrictions apply | Clinical interpretation and scientific inference, but with the understanding that the data represent statistical profiles |
Synthetic clinical data sets | Realistic, but not real, observational patient data generated statistically using population distributions of observational patient data | None | Feasibility assessments and algorithm validation; generation of clinical profiles |
COHD |
Counts of observational clinical co‐occurrences (e.g., co‐occurrences of specific diagnoses and prescribed medications), as well as their relative frequency and observed–expected frequency ratio Data are derived from Columbia University Irving Medical Center |
None | Clinical interpretation and scientific inference, but with the understanding that the data are restricted to co‐occurrences |
ICEES |
Patient‐level or visit‐level counts of observational patient data integrated at the patient and visit level with a variety of environmental exposures derived from multiple public data sources Data are derived from UNC Health Care System and a variety of public data sources on environmental exposures |
IRB approval is required to generate ICEES integrated feature tables; no other restrictions apply | Clinical interpretation and scientific inference, but with the understanding that the raw data have been transformed (e.g., binned or categorized) |
COHD, Columbia Open Health Data; HIPAA, Health Insurance Portability and Accountability Act; HuSH+, HIPAA Safe Harbor Plus; ICEES, Integrated Clinical and Environmental Exposures Service; IRB, institutional review board; PHI, protected health information; UNC, University of North Carolina.
aIndividual institutions may require a secure workspace for data access and use. bWhile HIPAA and IRB regulations do not apply, institutional approvals may be required. cHuSH+ clinical data sets were conceptualized and created by UNC as part of the National Center for Advancing Translational Sciences–funded Biomedical Data Translator program. The institution requires a fully executed data use agreement for access to the data.