Skip to main content
. Author manuscript; available in PMC: 2015 Dec 16.
Published in final edited form as: Policy Polit Nurs Pract. 2015 Sep 8;16(0):117–124. doi: 10.1177/1527154415603358

Table 1.

Approaches for Overcoming Challenges of Working With Electronic Data.

NIH BD2K challenge Challenges Recommendations
Locating data and
 software tools.
Difficult to identify and establish contact
 with owner or administrator of each
 data source.
Some data needed for analyses are
 proprietary and not released for
 research purposes.
  • Discuss software needs with programmers from each data source before choosing which package(s) to purchase.

  • Make software decisions with frontline programmer(s) who will be helping to deliver the data, not the administrator who may lack hands-on experience with preparing the data.

  • Discuss specific data sources before developing research protocol. Identify the specific data source and any limitations a priori. Obtain a sample of the data if possible to ensure it reflects the intended construct and troubleshoot any quality issues.

  • Retain this code to make sure subsequent data extractions are done using the same methodology and taken from the same source. Work with the same programmer or team if possible to ensure consistency.

Getting access to the data
 and software tools.
Lack of clarity regarding the order in which
 approvals should be obtained (e.g., IRB
 approval was required prior to institu-
 tional data use approval, and vice versa).
Reliance on programmers with other
 obligations for data extractions.
Programmers may lack the time or
 experience to review data for accuracy,
 requiring multiple iterations of data
 extraction.
  • Consider in advance the physical limitations of data sharing and work with relevant IT departments to establish the most efficient system.

  • Learn who is responsible for granting permissions to use and access data and discuss with them in advance whether direct access to the data can be provided, or whether the data must be delivered by another programmer.

  • If programming staff from the source data system must be used, account for how that person’s time will be allocated and funded.

Standardizing data and
 metadata.
Evolving institutional data use policies and
 procedures.
Shifting roles, responsibilities, overlap, and
 turnover among data administrators.
Some variables may not be available due to
 missing fields, inaccurate recording, or
 changes in recording practices over
 time.
Sources of the same data may not match.
Data delivered in incompatible formats.
  • Contribute to process improvement by providing feedback about the experience of using electronic data for research purposes.

  • Keep abreast of changes in institutional policies and staff.

Extending policies and
 practices for data and
 software sharing.
No dedicated support for programmers
 providing data from existing sources.
Inadequate funding for data storage space
 or multiple software packages.
Policies and procedures governing secure
 data transfer evolve rapidly, making it
 difficult to remain in compliance.
  • Understand current policies, keep abreast of changes, and establish collaborative relationships with data administrators.

  • Consider how long it will take to gain necessary approvals and account for this in the study timeline.

Organizing, managing, and
 processing data.
Codebooks describing the origins of each
 element in the raw data are often not
 available.
Difficult to reconcile old and new coding
 schemes when changes are made over
 time.
Uncertainty about which data source
 should be considered the gold standard
 when assessing validity.
Changes in data collection and storage
 procedures over time not always
 documented.
  • Maintain detailed records of how every data element was extracted, regardless of whether the study’s data manager or programming staff from the source data performed the queries.

  • When available, retain old codebooks from source data because these may be overwritten as changes occur over time.

  • Keep detailed records of the decision rules and methodology used to create each variable.

Developing new methods
 for analyzing and inte-
 grating data.
Clinical investigators must agree on vari-
 able definitions that will be suitable for
 use in multiple study aims.
Missing data are common, and effect of
 bias on planned analyses must be taken
 into account.
  • Develop phenotyping algorithms to identify conditions that are not directly ascertainable from existing electronic data fields.

  • Conduct validation studies to determine sensitivity and specificity of various data sources relative to each other and clinician chart review.

Training researchers who
 can use data effectively.
Skills needed to carry out the project not
 fully understood a priori.
Clinical investigators not familiar with the
 technical aspects of the project.
Few data managers have programming,
 analytical, and clinical expertise.
  • Create codebooks that include detailed variable definitions, including detailed descriptions of the data sources and known limitations.

  • For variables created for a specific purpose or project, retain decision rules and rationale so that future investigators can properly determine whether the variable is relevant and appropriate for their study aims.

  • Partner with bioinformatics, information technology and other staff to provide appropriate expertise.

Source. National Institutes of Health, 2015.

Note. NIH = National Institutes of Health; BD2K = Big Data to Knowledge; IRB = institutional review board.