Table 1.
Approaches for Overcoming Challenges of Working With Electronic Data.
NIH BD2K challenge | Challenges | Recommendations |
---|---|---|
Locating data and software tools. |
Difficult to identify and establish contact with owner or administrator of each data source. Some data needed for analyses are proprietary and not released for research purposes. |
|
Getting access to the data and software tools. |
Lack of clarity regarding the order in which approvals should be obtained (e.g., IRB approval was required prior to institu- tional data use approval, and vice versa). Reliance on programmers with other obligations for data extractions. Programmers may lack the time or experience to review data for accuracy, requiring multiple iterations of data extraction. |
|
Standardizing data and metadata. |
Evolving institutional data use policies and procedures. Shifting roles, responsibilities, overlap, and turnover among data administrators. Some variables may not be available due to missing fields, inaccurate recording, or changes in recording practices over time. Sources of the same data may not match. Data delivered in incompatible formats. |
|
Extending policies and practices for data and software sharing. |
No dedicated support for programmers providing data from existing sources. Inadequate funding for data storage space or multiple software packages. Policies and procedures governing secure data transfer evolve rapidly, making it difficult to remain in compliance. |
|
Organizing, managing, and processing data. |
Codebooks describing the origins of each element in the raw data are often not available. Difficult to reconcile old and new coding schemes when changes are made over time. Uncertainty about which data source should be considered the gold standard when assessing validity. Changes in data collection and storage procedures over time not always documented. |
|
Developing new methods for analyzing and inte- grating data. |
Clinical investigators must agree on vari- able definitions that will be suitable for use in multiple study aims. Missing data are common, and effect of bias on planned analyses must be taken into account. |
|
Training researchers who can use data effectively. |
Skills needed to carry out the project not fully understood a priori. Clinical investigators not familiar with the technical aspects of the project. Few data managers have programming, analytical, and clinical expertise. |
|
Source. National Institutes of Health, 2015.
Note. NIH = National Institutes of Health; BD2K = Big Data to Knowledge; IRB = institutional review board.