Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2014 Nov 14;2014:616–625.

What Is Asked in Clinical Data Request Forms? A Multi-site Thematic Analysis of Forms Towards Better Data Access Support

David A Hanauer 1,2,*, Gregory W Hruby 3,*, Daniel G Fort 3, Luke V Rasmussen 4, Eneida A Mendonça 5,6, Chunhua Weng 3
PMCID: PMC4419980  PMID: 25954367

Abstract

Many academic medical centers have aggregated data from multiple clinical systems into centralized repositories. These repositories can then be queried by skilled data analysts who act as intermediaries between the data stores and the research teams. To obtain data, researchers are often expected to complete a data request form. Such forms are meant to support record-keeping and, most importantly, provide a means for conveying complex data needs in a clear and understandable manner. Yet little is known about how data request forms are constructed and how effective they are likely to be. We conducted a content analysis of ten data request forms from CTSA-supported institutions. We found that most of the forms over-emphasized the collection of metadata that were not considered germane to the actual data needs. Based on our findings, we provide recommendations to improve the quality of data request forms in support of clinical and translational research.

Introduction

Clinical and translational research is a growing priority of the United States (US) National Institutes of Health (NIH). To encourage greater advancements in this area the NIH has supported over 60 research institutions through the Clinical and Translational Science Awards (CTSA).1 At the same time there has been a substantial increase in the adoption of electronic health records (EHRs), with nearly half of US hospitals now with one or more EHRs in place.2 This concomitant investment in both the research enterprise and in health information systems that capture data electronically has presented unprecedented opportunities for advancing clinical and translational science, and is a necessary precursor for building the foundations of a broad-scale ‘learning health system’.3,4

Research tasks that were previously impractical, if not impossible, to perform with paper-based health records have now become achievable due to the large volumes of data stored in a ‘readily accessible’ electronic format. However, in addition to privacy and security constraints, numerous difficulties remain with respect to access and use of the data.5 Compared to paper records, EHR data should be much easier to aggregate across large numbers of patients, but the complexity of the underlying systems, including the heterogeneity in metadata, data structures, and even the data itself, often hinders their computational reuse by a broad range of stakeholders.68 Further, prior work has shown that it is not uncommon for a hospital to have hundreds of different IT systems.9 Data from multiple health information systems are thus often aggregated into databases commonly referred to as data or information warehouses, data repositories, data marts, or data networks.1016 A major theme of the NIH roadmap has been the idea of “Re-Engineering the Clinical Research Enterprise”,17 and one of the recognized challenges has been providing the means to “facilitate access to research…resources by scientists, clinicians” and others.18 However, two major barriers exist with respect to data access.

First, to help meet the needs of clinical and translational research, ‘self-service’ tools have been developed to provide a means for data access as well as analysis and visualization.1926 Many of these tools have been widely implemented and have achieved a good level of adoption. While self-service tools have been demonstrated to work well for various scenarios,27 by nature of their intended simplicity for a broad user base, these systems often cannot handle all of the complex data needs that are required by biomedical research teams.20,28 Researchers usually do not have the database knowledge nor do they understand what is involved in data retrieval. In contrast, data managers or query analysts usually do not know how to ask questions to elicit data needs using non-technical language understandable by researchers.29 Data need negotiation usually involves several “trial-and-error” iterations. As a result, many institutions have recognized the need to invest in informatics or IT experts (often called data analysts or report writers) to serve as an intermediary between the complex data sources and the biomedical researchers, the latter of whom have significant domain expertise but often lack training in data access approaches such as the use of structured query languages (SQL).2931

Second, for liability considerations and HIPAA or regulatory compliance, data owners need to carefully check the credentials and qualifications of data requesters, which usually involves lengthy review processes involving multiple institutional review offices. An important artifact, the data request form, is the nexus linking all the stakeholders in the process of providing data access for researchers. Such forms are generally meant to serve documentation and communication needs for multiple stakeholders, including researchers, query analysts, data owners, and regulatory officers.32 They can provide a means for researchers to list their credentials and specify their needs through a formal request process. They also help data stewards verify if the appropriate regulatory approvals are in place and to help with other administrative bookkeeping. Importantly, data request forms are also meant to provide a means for research teams to communicate complex data needs in a manner that can be understood by a data analyst and converted into executable database queries for data retrieval.33 It follows, then, that the manner in which these forms clearly, and unambiguously, define the data needs and sources can have major downstream consequences for the subsequent research on which the request is based. Yet there are no published standards for designing EHR data request forms, or even best practices about which an institution can turn to in constructing a form. It is therefore up to each institution to develop their own form with the hope that the right questions are being asked of data requestors in order to ensure that data needs are being met accurately and efficiently.

Therefore, the data request form plays an indispensable role in facilitating data access for researchers in many institutions. Motivated to provide better data access to the broad clinical and translational research community, we aim to understand (1) how the current forms efficiently collect information needed by data owners and effectively communicate data needs of researchers and (2) if they collect necessary and relevant information that cannot be extracted for reuse from existing institutional information systems. In this study we conducted a formal content analysis of data request forms from multiple academic institutions affiliated with a CTSA award. Our goal was to develop a deeper understanding of what questions are typically asked on the forms, and to help provide insights regarding whether current data request forms provide adequate coverage of salient details to capture data needs effectively. To achieve these goals, first, we obtained ten data request forms from CTSA-supported academic medical centers in the US. Then we developed a form annotation schema based on the consensus of two annotators and used this coding book to annotate the forms. On this basis, we conducted a detailed content analysis of the forms and identified information deficiencies as well as unnecessary workload imposed on researchers that exist across many forms in use today. Finally, we provided insights and recommendations from our analysis that could be used to improve the content of data request forms and, ultimately, improve the process for obtaining complex data from institutional repositories in support of clinical and translational research.

Methods

A. Collection of data request forms

Ten data request forms were obtained for this study. All forms were in use at CTSA-supported academic medical centers around the US as of February 2014. Four of the forms were obtained through personal contacts by the authors, whereas the remaining six were identified through an online search with the Google search engine using the strings “EHR data request” and “medical research data request.” The ten CTSA-supported institutions from which these forms were actively in use were: Boston University, Columbia University, Northwestern University, University of California - San Diego, University of California - San Francisco, University of Colorado Denver, University of Kansas, University of Michigan, University of Wisconsin, and Vanderbilt University. Note that these institutions are listed here in alphabetical order, which does not match the order in which they are presented in the results section, wherein only a letter is used to identify each form.

B. Development of a codebook

Five of the ten data request forms were randomly selected for developing the coding schema for the content analysis. Two reviewers (GH and DH) independently evaluated the five forms and developed a list of themes derived from the forms. These themes were based only on the actual questions asked on each of the forms, although several research questions helped guide the analysis. These included (1) what high-level organizational categories can data request form elements be assigned to? (2) what percentage of metadata could potentially be obtained from source systems without asking research teams to copy it to a form? (3) how are request form items distributed between administrative data (i.e., ‘bookkeeping’) and actual data requests? (4) how much detail does each element on a form seek to obtain from a user completing the form?

The theme lists from both reviewers were then compared, discussed, and consolidated into a single list. A third reviewer (CW) evaluated the merged list and refined it further. Finally, two reviewers (GH and DF) compared two randomly selected forms to finalize the codebook and address additional gaps in code coverage. Similar themes were then grouped into logical categories (e.g., “Compliance”, “Data Use”) and numbered. This final list served as our codebook, which is shown in Table 1.

Table 1.

Form elements comprising the codebook for the content analysis of the data request forms, including examples of each type of element. When an element could be coded as Simple [S] or Extensive [E], an example of each is provided. Basic elements were only coded as Simple [S] if present; thus no Extensive example is provided.

Code Name Description Example(s)
1.0 Requester Metadata Any form elements that describe the user requesting data not a coding element
1.1 Name This element may include the name, and/or contact data of the requester [S] Requester Name [E] Requester Name, Department, Email
1.2 PI/Supervisor/ Department Head This element may include the name, and/or contact data of the requester’s PI, supervisor and/or department head [S] Supervisor Name [E] Supervisor Name, Department, Email
1.3 Billing/ Administrative This element may include the name, and/or contact data of the requester’s administrator or other billing information [S] Administrative Name [E] Administrative Name, Department, Email
1.4 Other Any other attributes associated with the requester, and not associated with the content of the request [S] Are you a part of the CTSA?
2.0 Request Metadata Any form elements that describe the actual request not a coding element
2.1 Study Title/Request This is a brief summation of the request. [S] Project Title; Research Question
2.2 Existing/ New Request This element specifies if the request is new or a modification to an existing request [S] Is this a new request or a modification to an existing report
2.3 Funding Source This element is asking who is financially supporting the use of this data. [S] What are your funding sources?
[E] Will funds be used to pay subcontractors; do funding sources have restrictions on the use of the data collected for this project?
2.4 Request Purpose Concerns the use of the data being request. For example will it facilitate an internal administrative report, research or preparatory for research, cohort/Clinical trial recruitment? [S] Will the requested data be applied to any of the following areas? Non-research, Patient Care, Operations, Research, etc.
2.5 Request Type This element specifies the degree of data access the user requires. [S] Multiple Choice: Self-service, Super user
[E] Study Design Consultation, Research Navigator
2.6 Data Sources Any element that asks the user to specify the source of data, for example this maybe a particular database, or a particular clinical site where the user thinks the data may originate. [S] Sources of data? (Text Box)
[E] Sources of data? (Multiple Choice)
2.7 Data Element Specification This element refers to any description of the medical data elements the requester is after. [S] Describe the data you need.
[E] What is your selection criteria, From what
time period… What data fields do you need?
2.8 Recurring Requests This element is specific to the frequency of data delivery. A clinical trial that submits a request to aid recruitment may wish to receive a weekly dump of potential matches. [S] Is this a one-time request or recurring?
3.0 Compliance Form elemenst related to a compliance attribute, such as IRB, PHI, internal regulations, or documentation requirements not a coding element
3.1 Institutional review board (IRB) If the request is research, this element request details on the IRB number or if the protocol is IRB exempt. [S] IRB number
3.2 IRB Proof Elements that require IRB proof [S] Please upload your approved IRB protocol
3.3 Protected Health Information (PHI) Regardless of request purpose, this element specifies HIPAA compliance and asks to what level of identified data (if any at all) are needed. [S] Will the data be identified or de-identified
[E] Please select the type of data you will need: identified, de-identified, limited decedent, aggregate counts…
3.4 Compliance Other This element concerns any type of compliance attribute, whether it be IRB, PHI, internal regulations, or documentation requirements that could not be classified elsewhere [S] Provide your consent (or waiver of consent)
4.0 Data Use Refers to how the requester is going to use or share the data not a coding element
4.1 Internal Data Sharing This element represents how the user is sharing the data within their team, where the data is going to be stored, how the data is to be delivered, or the format of the data. [S] Please describe data storage and use plan
[E] Who will have access to the data, where the data is to be stored, data delivery & format
4.2 External collaborators data use agreement (DUA) If the requester is sharing the information with an external collaborator, is there a formal data use agreement [S] Is there a DUA?
[E] Name non-affiliated project team members
that will have access to the data; upload DUA.
4.3 Public Sharing of Original Dataset This elements refers to the intent of the requester to publish the original dataset [S] Will data be made publically available?
[S] Do you plan on making this data publically available, how so?
4.4 Terms and conditions of use This element refers to any mention of terms and conditions the requester must agree to for the release of the data to them. [S] Please read/agree to these terms and conditions for the use of this data.
4.5 Data Use Other This element includes items that were not specifically covered in the other data use categories [S] Who is your intended audience for data reporting?
5.0 Miscellaneous Form element that cannot be categorized elsewhere not a coding element
5.1 Elements not classified elsewhere Items that did not fit into other categories. [S] Is this an emergency request due to a grant deadline
[S] Will you be contacting patients?

We also utilized a ‘comprehensiveness’ measure to indicate the breadth of each element: Simple (S) or Extensive (E). Simple elements were related to a very focused, narrow question on a form (e.g., “Your Name”), whereas Extensive elements had a much broader scope. For example, an Extensive element asked the requestor to “indicate all identifiers (PHI) that may be included in the study research record”, followed by a list of all 18 HIPAA identifiers with a checkbox next to each. Examples of Simple and Extensive elements with respect to the codebook are also shown in Table 1. Note that some elements (e.g., codes 1.4, 2.1, 2.2 in Table 1) were judged by the team to only be coded using a Simple ‘comprehensiveness’ measure; others could be either Simple or Extensive.

C. Form annotation by two annotators

Each data request form was divided into individual, granular form elements based on the questions asked on each form. For example, one of the forms had a single numbered question comprised of two sub-questions, (1) “describe the data security procedures” and (2) “who will have access to the data”. These were split into two distinct elements for coding. Each data request form element was then entered into the Coding Analysis Toolkit (CAT; Texifter, Amherst, MA). The CAT provided the capability for each element to be shown to an annotator on a computer screen along with the codebook so that all elements could be reviewed and coded efficiently. Using the CAT, two annotators (GH and DF) independently reviewed and coded all of the data elements from each of the ten data request forms, including whether each was Simple or Extensive in terms of comprehensiveness. Inter-rater agreement for each form was assessed with the kappa statistic. Coding disagreements were then discussed between the two coders and code assignment consensus was reached.

D. Content analysis of the ten forms

From the coded elements on each form we estimated the completeness of information about data needs captured in each form. This was done by assigning a numerical score to each element in the code book based on the comprehensiveness measure (Simple=1, Extensive=3) that represented the maximum score each item could be assigned. Forms that had ≥ 3 Simple elements assigned to the same code were considered to have an Extensive comprehensiveness measure of that code by nature of having multiple elements covering the same concept. We then computed the percent coverage of all possible elements by summing the scores per form and dividing by the total number of possible points a theoretical, all-inclusive form would have had. Finally, we assessed the form elements coded with either code element 2.1 and 2.7 for their ability to capture the salient details that would likely be necessary to capture the context and content of data requests in a reliable manner, which may serve as a communication channel between biomedical research teams and data analysts.

Results

The primary results from our analysis are shown in Table 2. There was substantial variation in how much detail each form covered and in the elements that were covered. Based on our metric of coverage, the top three forms (A, C, and J) had coverage of 52%, 48%, and 48%, respectively. Form B was much more sparse with only 11% total coverage. In general, forms that had more overall elements (or individual questions) also had better coverage, but the relationship was not completely linear. For example, Form A with the highest percentage of coverage (52%) only had 15 total elements whereas form F had 19 total elements but only 35% overall coverage. This discrepancy was most often due to either the number of Simple versus Extensive elements used on a form (e.g., fewer elements, but more extensive coverage by each element) or due to many elements disproportionately being related to only a handful of related questions (e.g., one form had four elements dedicated to the funding source).

Table 2.

Summary of the coding analysis performed on the ten data request forms. If a cell is shaded it means that the specific code (row) was found to exist in the specific form (columns A–J). Additionally, the comprehensiveness measure of each element is shown with either an S (Simple, light shading) or E (Extensive, dark shading); those with ≥3 Simple elements on a form related to a single code were assigned an ‘E’ label even if it was not originally coded as being Extensive. The “Max Score” column represents the total number of points a form element could be assigned as a representation of its comprehensiveness. The total coverage of all elements for each form is shown at the bottom of the table as both a sum and percentage. Note that cells with an Extensive comprehensiveness label were given a score of 3 and those with a Simple comprehensiveness label were given a score of 1. The “# Forms with element” column is a sum of the number of distinct forms that had at least one element on the form that had the respective code in it. For example, nine forms contained code 2.1 (“Study Title/Request”).

Code Description Max Score Form # Forms with element
A B C D E F G H I J
1.0 Requester Metadata
1.1 Name 3 E E E E E 5
1.2 PI, supervisor, department head 3 E S E E E E E 7
1.3 Billing/Administrative content 3 E S S E 4
1.4 Other 1 S S S 3
2.0 Request Metadata
2.1 Study Title/Request 1 S S S S S S S S S 9
2.2 Existing/New request 1 S S S 3
2.3 Funding source 3 E S S S 4
2.4 Request purpose 1 S S S S E* S S S 8
2.5 Request type 3 S E S 3
2.6 Data sources 3 E S S 3
2.7 Data element specification 3 E E S S S S 6
2.8 Recurring requests 1 S S 2
3.0 Compliance
3.1 IRB 1 S S S S 4
3.2 IRB proof 1 S S 2
3.3 PHI 3 E E 2
3.4 Compliance other 3 S E S E 4
4.0 Data Use
4.1 Internal data sharing 3 E S E S 4
4.2 External collaborators DUA 3 E S 2
4.3 Public sharing of original dataset 1 E 1
4.4 Terms and conditions of use 1 S S 2
4.5 Data use other 1 S 1
5.0 Miscellaneous
5.1 Elements not classified elsewhere 3 S E S S E S E E 8
 
Total Score 46 24 5 22 8 13 16 13 10 14 22
Percent coverage of all possible elements 100% 52% 11% 48% 17% 28% 35% 28% 22% 30% 48%
 
Total number of distinct form elements identified for coding 15 5 25 10 9 19 11 11 21 36
*

This was labeled Extensive because there were 4 distinct Simple elements related to category 2.4; however, this category was considered to be a Simple category. Thus, in this row it still only counts as 1 towards the total score.

Nine out of the ten forms asked about the title of the study/request, and this was the most common question asked across the forms. Other questions were less commonly asked. Only two forms (A and J) explicitly requested proof of study approval from an institution review board, and only one form (G) asked if there was a plan to share the original data set publically. At a category level, four forms did not have a single element related to “Compliance” and three did not have a single element related to “Data Use”. All forms incorporated at least one element related to the categories of “Requester Metadata” and “Request Metadata”, the latter of which is most important for understanding the actual data needs for a request. Within the “Request Metadata”, codes 2.1 (“Study Title/Request”) and 2.7 (“Data element specification”) were determined to be the most relevant for a data analyst to understand the specific needs of the research team. Therefore, we list the specific elements for codes 2.1 and 2.7 derived from all 10 forms within Table 3. Some forms asked detailed questions (e.g., five distinct elements coded 2.7 on form F) whereas others asked very basic questions (one element coded 2.1 on form C).

Table 3.

Data elements related to codes 2.1 (“Study Title/Request”) and 2.7 (“Data element specification”). These two codes were judged to be the most relevant for a data analyst to understand the information needs of the research team. Note that form D did not contain any elements for which these codes could be applied.

Form Code Comprehensiveness Element Header Excerpt Element Question Element Options
A 2.1 S General Reason for Request Brief description of intent for use of data and/or associated project Text Box
A 2.7 E Research Request Reason Please included as applicable: Request Information (Please include Request Description and if known) - Data Elements, Date Range/Parameters, Sort Sequence, Included Population (e.g. nursing units, DRG codes), Excluded Population (exceptions to the included population), Associated Form (Eclipsys Use Only)… Document Upload
B 2.1 S Please provide the following information I need the new report because… Text Box
C 2.1 S Data Type Full Study Title Text Box
E 2.1 S If the purpose of your request is for Patient Care, Education, Administrative, Billing/Payment…complete the following Give a brief description of your project in the space below: Text Box
F 2.1 S DATA REQUEST FORM Study Title/Study Idea Text Box
F 2.7 E Data and/or Records Needed for Research Protocol: Include the following… Selection Criteria (e.g., all patients with a visit with an ICD-9 780.3x and/or 345.x, English speakers whose age > 50 and age <= 75, etc.) Text Box
F 2.7 E Data and/or Records Needed for Research Protocol: Include the following… Counts (if applicable): (e.g., number of patients seen by Firm A, B, C grouped by under 65 and 65 or older) Text Box
F 2.7 E Data and/or Records Needed for Research Protocol: Include the following… Dates of Records: (e.g., January 1, 2004 March 31, 2005) Text Box
F 2.7 E Data and/or Records Needed for Research Protocol: Include the following… Number of Records: (e.g., 2000 patients with specified diagnosis, 10% sample of patients with diagnosis, all patients admitted thru ED) Text Box
F 2.7 E Data and/or Records Needed for Research Protocol: Include the following… List of Data Fields: (e.g., age, race, diagnosis, service area, PCP, etc.) TextBox
G 2.1 S complete the following questions Describe the project for which the data is requested: Text Box
G 2.1 S complete the following questions What is the purpose of the project or study? Text Box
G 2.7 S complete the following questions Describe the data elements needed, such as cancer type (site and histology), geographic location and dates… Text Box
H 2.1 S What are the objectives of this project? What question(s) are you trying to answer? Text Box
H 2.1 S What are the objectives of this project? What problem(s) are you trying to solve? Text Box
H 2.7 S What are the data requirements? How much historical data are needed to meet the targeted reporting scope? Text Box
H 2.7 S What are the data requirements? How current do the data need to be to support the targeted reporting? Text Box
I 2.1 S Project Details Project Title Text Box
I 2.7 S Project Details Please explain below and describe, in detail, the nature of your request to BMI/ICTR. Please do not include any protected health information (PHI) Text Box
J 2.1 S General Question Protocol Title Text Box
J 2.7 S General Question Anticipated Enrollment Text Box
J 2.7 S General Question Is your anticipated enrollment period greater than a year Y/N/NA

During the coding process we also came across form elements that stood out from the rest, based on the unusual or interesting nature of the questions. These are detailed in Table 4. This table also contains descriptions based on our consensus opinion on why those specific elements were noteworthy. Overall, coding the forms was challenging due to the highly variable manner in which questions were worded. For the ten forms in our analysis, the initial Kappa scores measuring the inter-rater agreement were quite variable, ranging from 0.14 to 0.86 (full list for the forms in the order presented in Table 2: 0.83, 0.86, 0.57, 0.14, 0.64, 0.65, 0.52, 0.55, 0.43, 0.76). Thus, some forms required considerable effort to reach consensus on the final coding of each element.

Table 4.

Noteworthy atypical form elements grouped from different forms.

Element Why noteworthy
“I want to write my own SQL queries” Allows for the possibility of self-service of the complex databases for advanced users. It is unclear what type of guidance or oversight is provided for such requests.
“Please specify what type of Biomedical Informatics Services you are requesting: REDCap, Velos…” This form combined questions related to data requests and those related to data storage.
““Will you be contacting patients? ____No ____Yes. If yes, please justify the need.” This form seemed to conflate the role of data request fulfillment with that of an institutional review board (IRB). A judgment about the appropriateness of contacting patients is generally handled within the framework of an IRB.
“Principal Investigator: Degree(s):” It is unclear what the need is for the academic degrees of the principle investigator. It is possible that some institutions limit data access to investigators with a terminal degree.
“What question(s) are you trying to answer” “What problem(s) are you trying to solve” These questions appear to be aimed at developing a broader perspective about the specific needs and goals of the research term. This information could be useful to help the analyst better understand the context for the data request.

Discussion

Our analysis of research data request forms revealed several interesting findings. Foremost was the substantial variability in the content and comprehensiveness of the forms. This variability suggests that there is no universal or community-based consensus, even among CTSA institutions, about the optimal way in which a data request form should be designed, what the ‘right’ questions to ask are, and how they should be asked (i.e, expecting simple or extensive answers). This could cause downstream consequences including an inability to meet regulatory requirements (e.g., no record of IRB approval verification) or an inability to track research data use in trustworthy ways, as well as problems developing the right queries to meet the fine-grained needs of research teams.

Our analysis raised the important question about how well overall the forms were designed. Being able to answer this question adequately depends, in part, on how well the forms could capture complex data needs accurately and in a reproducible manner. Some forms were very vague or brief about asking researchers what was needed, whereas others asked about specific elements (Table 3). Yet we did identify one form that contained questions that seemed to be aimed at helping the analyst develop a deeper understanding of what data were being sought (Table 4, row 5) and this may be a useful approach to improve communication.

Because data request forms might serve as the first point of contact between a data management team and a research team, improvement of these forms could provide great benefit. It has been shown that work focused on redesigning pathology test request forms has been beneficial,3436 so it may be reasonable to extrapolate that similar benefits could be achieved with redesigned data request forms. The process of developing appropriate data queries from complex user needs can take multiple rounds of refinement,29 but current forms do not appear to be designed to support this process well. It has been noted in the literature that adequately meeting the data needs of investigators for a single request can take a long time37 so any efficiencies that can be gained would be welcomed.

Data requests forms have been mentioned in the literature32,38 (often as a side note) but little attention has been paid to their role in helping investigators obtain data accurately and efficiently. Relative to other form elements, our analysis indicates elements used to elicit the context and content of the requester’s data need are lacking. The utilization of frameworks such as PICO (problem/population, intervention, comparison, and outcome) might prove to be advantageous in this setting.39,40 With PICO, requesters are encouraged to structure the information need along each of the four dimensions, which could help convey a more realistic description of the request.

Additionally, the effectiveness of forms could likely be improved by providing additional education to investigators about the nature of the data in the systems while at the same time helping to guide researchers through the request form in a more logical manner to ensure that all the important aspects are covered. It has been observed that familiarity with the database fields by research teams is essential even when working with data analysts41 but the forms we analyzed did not provide such details. It is possible that some of the forms we reviewed were meant to be accompanied by additional descriptive documents, but we did not come across them in our search. We also did not identify any forms that discussed the issues about data in coded format versus free text narratives, or what types of data are generally found in either of those types of sources.

The forms that comprised our analysis appeared to be constructed to meet the needs of multiple stakeholders (researcher, compliance, IT, etc.). What was surprising, however, is that many forms were unbalanced and placed a greater emphasis on capturing administrative (i.e., bookkeeping) data rather than on the details necessary to execute an effective data query. At the large academic centers generally funded by CTSAs, it is likely that many of these data elements already exist in electronic format in administrative databases and might not even need to be transcribed onto a form. Additionally, asking about the degrees of the principal investigator (Table 4, row 4), for example, may be a reflection of a data governance concerns; that is, trainees or temporary employees without terminal degrees may not be granted access to the data at some institutions.

Future work should seek to understand what are the core set of elements that elicit actionable information related to common data requests. To this end, a careful analysis of actual data requests in order to be able to map the type of data needs to appropriate elements on existing forms, or to create new form elements when needed. Understanding these needs is a first step towards developing solutions to meet those needs.42 Cimino et al. recently described their work related to understanding complex queries to better develop data retrieval capabilities in the self-service tool BTRIS (Biomedical Translational Research Information System) in use at the NIH.20 Their goal was to better empower users to obtain the needed data rather than having to rely on data analysts to retrieve the data for them. Several of their observations could likely also improve the design of data request forms, specifically the recognition that the requirements from users “included types of data, constraints on data, and data sets formed from inclusion from multiple data sources.”20

In addition, future work should seek to quantify the time it takes to complete the elements on a data request form, and if there may be a reasonable tradeoff between form length and the subsequent quality and efficiency of the data extraction. Additionally, observing investigators as they fill out the forms could provide insights about what form elements may be confusing or ambiguous.

From our analysis we are able to make several recommendations about future data request form development: (1) more effort should be made to standardize the types of questions being asked across institutions; (2) whenever possible, forms should de-emphasize the collection of administrative metadata and expand the scope of elements related to the request itself; (3) despite decrease administrative metadata, forms should capture enough information to ensure that regulatory requirements about data use, privacy, and human subjects protection are being met; (4) form design should match the data requirements of investigators–since this is not well described, further research will be needed to elucidate these requirements; (5) because data requirements may vary based on the intended use (e.g., research versus administrative), a ‘one-size-fits-all’ form may not always be ideal, and forms customized to various use cases may be more effective; and (6) forms should provide at least a minimal level of detail to ensure that users understand the selections and options, including details about data sources and data types.

Conclusions

To serve people we must first understand them. A data request form is meant to be a tool to facilitate an understanding between data owners and data requesters, rather than a burden on researchers serving bureaucratic purposes. This analysis of research data requests forms revealed considerable heterogeneity in form content, both in the breadth and depth of the topics covered. Additionally, most forms over-emphasize the collection of administrative metadata and under-emphasize the collection of important details necessary to communicate a complex data request to a reporting team. Future work should focus on better understanding the content and nature of data requests from the perspective of multiple stakeholders to help inform the design of new data requests forms that can better capture the complexity of clinical and translational research teams.

Acknowledgments

This work was supported by National Library of Medicine grants R01LM009886 and R01LM010815, and by National Center for Advancing Translational Sciences grant UL1TR000040. The content is solely the responsibility of the authors and does not necessarily represent the official views of the supporting agencies.

References

  • 1.Zerhouni EA, Alving B. Clinical and translational science awards: a framework for a national research agenda. Transl Res. 2006 Jul;148(1):4–5. doi: 10.1016/j.lab.2006.05.001. [DOI] [PubMed] [Google Scholar]
  • 2.DesRoches CM, Charles D, Furukawa MF, et al. Adoption of electronic health records grows rapidly, but fewer than half of US hospitals had at least a basic system in 2012. Health Aff (Millwood) 2013 Aug;32(8):1478–85. doi: 10.1377/hlthaff.2013.0308. [DOI] [PubMed] [Google Scholar]
  • 3.Friedman C, Rigby M. Conceptualising and creating a global learning health system. Int J Med Inform. 2013 Apr;82(4):e63–71. doi: 10.1016/j.ijmedinf.2012.05.010. [DOI] [PubMed] [Google Scholar]
  • 4.Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med. 2010 Nov 10;2(57):57cm29. doi: 10.1126/scitranslmed.3001456. [DOI] [PubMed] [Google Scholar]
  • 5.Christensen T, Grimsmo A. Instant availability of patient records, but diminished availability of patient information: a multi-method study of GP’s use of electronic patient records. BMC Med Inform Decis Mak. 2008;8:12. doi: 10.1186/1472-6947-8-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chute CG, Ullman-Cullere M, Wood GM, Lin SM, He M, Pathak J. Some experiences and opportunities for big data in translational research. Genet Med. 2013 Oct;15(10):802–9. doi: 10.1038/gim.2013.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sujansky W. Heterogeneous database integration in biomedicine. J Biomed Inform. 2001 Aug;34(4):285–98. doi: 10.1006/jbin.2001.1024. [DOI] [PubMed] [Google Scholar]
  • 8.Yu C, Hanauer DA, Athey BD, Jagadish HV, States DJ. Simplifying access to a Clinical Data Repository using schema summarization. AMIA Annu Symp Proc. 2007:1163. [PubMed] [Google Scholar]
  • 9.Smith SW, Koppel R. Healthcare information technology’s relativity problems: a typology of how patients’ physical reality, clinicians’ mental models, and healthcare information technology differ. J Am Med Inform Assoc. 2014 Jan-Feb;21(1):117–31. doi: 10.1136/amiajnl-2012-001419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.PCORnet: The National Patient-Centered Clinical Research Network. [Accessed on March 12, 2014]. Found at http://www.pcori.org/funding-opportunities/pcornet-national-patient-centered-clinical-research-network/
  • 11.Chute CG, Beck SA, Fisk TB, Mohr DN. The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data. J Am Med Inform Assoc. 2010 Mar-Apr;17(2):131–5. doi: 10.1136/jamia.2009.002691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Greim J, Housman D, Turchin A, et al. The quality data warehouse: delivering answers on demand. AMIA Annu Symp Proc. 2006:934. [PMC free article] [PubMed] [Google Scholar]
  • 13.Hruby GW, McKiernan J, Bakken S, Weng C. A centralized research data repository enhances retrospective outcomes research capacity: a case report. J Am Med Inform Assoc. 2013 May 1;20(3):563–7. doi: 10.1136/amiajnl-2012-001302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kamal J, Liu J, Ostrander M, et al. Information warehouse - a comprehensive informatics platform for business, clinical, and research applications. AMIA Annu Symp Proc. 2010;2010:452–6. [PMC free article] [PubMed] [Google Scholar]
  • 15.Lyman JA, Scully K, Harrison JH., Jr The development of health care data warehouses to support data mining. Clin Lab Med. 2008 Mar;28(1):55–71. vi. doi: 10.1016/j.cll.2007.10.003. [DOI] [PubMed] [Google Scholar]
  • 16.Wiesenauer M, Johner C, Rohrig R. Secondary use of clinical data in healthcare providers - an overview on research, regulatory and ethical requirements. Stud Health Technol Inform. 2012;180:614–8. [PubMed] [Google Scholar]
  • 17.Zerhouni EA. Translational and clinical science–time for a new vision. N Engl J Med. 2005 Oct 13;353(15):1621–3. doi: 10.1056/NEJMsb053723. [DOI] [PubMed] [Google Scholar]
  • 18.Shurin SB. Clinical translational science awards: opportunities and challenges. Clin Transl Sci. 2008 May;1(1):4. doi: 10.1111/j.1752-8062.2008.00009.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cimino JJ, Ayres EJ. The clinical research data repository of the US National Institutes of Health. Stud Health Technol Inform. 2010;160(Pt 2):1299–303. [PMC free article] [PubMed] [Google Scholar]
  • 20.Cimino JJ, Ayres EJ, Beri A, Freedman R, Oberholtzer E, Rath S. Developing a self-service query interface for re-using de-identified electronic health record data. Stud Health Technol Inform. 2013;192:632–6. [PMC free article] [PubMed] [Google Scholar]
  • 21.Del Rio S, Setzer DR. High yield purification of active transcription factor IIIA expressed in E. coli. Nucleic Acids Res. 1991 Nov 25;19(22):6197–203. doi: 10.1093/nar/19.22.6197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lowe HJ, Ferris TA, Hernandez PM, Weber SC. STRIDE–An integrated standards-based translational research informatics platform. AMIA Annu Symp Proc. 2009;2009:391–5. [PMC free article] [PubMed] [Google Scholar]
  • 23.Murphy SN, Weber G, Mendis M, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) J Am Med Inform Assoc. 2010 Mar-Apr;17(2):124–30. doi: 10.1136/jamia.2009.000893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Pennington JW, Ruth B, Italia MJ, et al. Harvest: an open platform for developing web-based biomedical data discovery and reporting applications. J Am Med Inform Assoc. 2014 Mar 1;21(2):379–83. doi: 10.1136/amiajnl-2013-001825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Weber GM, Murphy SN, McMurry AJ, et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009 Sep-Oct;16(5):624–30. doi: 10.1197/jamia.M3191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhang GQ, Siegler T, Saxman P, et al. VISAGE: A Query Interface for Clinical Research. AMIA Summits Transl Sci Proc. 2010;2010:76–80. [PMC free article] [PubMed] [Google Scholar]
  • 27.Danford CP, Horvath MM, Hammond WE, Ferranti JM. Does access modality matter? Evaluation of validity in reusing clinical care data. AMIA Annu Symp Proc. 2013;2013:278–83. [PMC free article] [PubMed] [Google Scholar]
  • 28.Deshmukh VG, Meystre SM, Mitchell JA. Evaluating the informatics for integrating biology and the bedside system for clinical research. BMC Med Res Methodol. 2009;9:70. doi: 10.1186/1471-2288-9-70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hruby GW, Boland MR, Cimino JJ, et al. Characterization of the biomedical query mediation process. AMIA Summits Transl Sci Proc. 2013;2013:89–93. [PMC free article] [PubMed] [Google Scholar]
  • 30.Brown PJ, Warmington V. Data quality probes-exploiting and improving the quality of electronic patient record data and patient care. Int J Med Inform. 2002 Dec 18;68(1–3):91–98. doi: 10.1016/s1386-5056(02)00068-0. [DOI] [PubMed] [Google Scholar]
  • 31.Wakefield DS, Clements K, Wakefield BJ, Burns J, Hahn-Cover K. A framework for analyzing data from the electronic health record: verbal orders as a case in point. Jt Comm J Qual Patient Saf. 2012 Oct;38(10):444–51. doi: 10.1016/s1553-7250(12)38059-8. [DOI] [PubMed] [Google Scholar]
  • 32.Gallagher SA, Smith AB, Matthews JE, et al. Roadmap for the development of the University of North Carolina at Chapel Hill Genitourinary OncoLogy Database–UNC GOLD. Urol Oncol. 2014 Jan;32(1):32 e1–9. doi: 10.1016/j.urolonc.2012.11.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Post AR, Sovarel AN, Harrison JH., Jr Abstraction-based temporal data retrieval for a Clinical Data Repository. AMIA Annu Symp Proc. 2007:603–7. [PMC free article] [PubMed] [Google Scholar]
  • 34.Durand-Zaleski I, Rymer JC, Roudot-Thoraval F, Revuz J, Rosa J. Reducing unnecessary laboratory use with new test request form: example of tumour markers. Lancet. 1993 Jul 17;342(8864):150–3. doi: 10.1016/0140-6736(93)91349-q. [DOI] [PubMed] [Google Scholar]
  • 35.Durieux P, Ravaud P, Porcher R, Fulla Y, Manet CS, Chaussade S. Long-term impact of a restrictive laboratory test ordering form on tumor marker prescriptions. Int J Technol Assess Health Care. 2003 Winter;19(1):106–13. doi: 10.1017/s0266462303000102. [DOI] [PubMed] [Google Scholar]
  • 36.Henderson AR. The test request form: a neglected route for communication between the physician and the clinical chemist? J Clin Pathol. 1982 Sep;35(9):986–98. doi: 10.1136/jcp.35.9.986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Dattani N, Hardelid P, Davey J, Gilbert R. Accessing electronic administrative health data for research takes time. Arch Dis Child. 2013 May;98(5):391–2. doi: 10.1136/archdischild-2013-303730. [DOI] [PubMed] [Google Scholar]
  • 38.Jackson JH, Gutierrez B, Lunacsek OE, Ramachandran S. Better Asthma Management with Advanced Technology: Creation of an Asthma Utilization Rx Analyzer (AURA) Tool. P T. 2009 Feb;34(2):80–5. [PMC free article] [PubMed] [Google Scholar]
  • 39.Huang X, Lin J, Demner-Fushman D. Evaluation of PICO as a knowledge representation for clinical questions; AMIA Annu Symp Proc; 2006. pp. 359–63. [PMC free article] [PubMed] [Google Scholar]
  • 40.Schardt C, Adams MB, Owens T, Keitz S, Fontelo P. Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Med Inform Decis Mak. 2007;7:16. doi: 10.1186/1472-6947-7-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Loke YK. Use of databases for clinical research. Arch Dis Child. 2014 Jan 31; doi: 10.1136/archdischild-2013-304466. [DOI] [PubMed] [Google Scholar]
  • 42.Natarajan K, Sobhani N, Boyer A, Wilcox AB. Analyzing Requests for Clinical Data for Self-Service Penetration; AMIA Annu Symp Proc; 2013. p. 1049. [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES