Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Apr 1.
Published in final edited form as: J Biomed Inform. 2016 Mar 10;60:376–384. doi: 10.1016/j.jbi.2016.03.004

Facilitating Biomedical Researchers’ Interrogation of Electronic Health Record Data: Ideas from Outside of Biomedical Informatics

Gregory W Hruby 1, Konstantina Matsoukas 2, James J Cimino 3, Chunhua Weng 1,*
PMCID: PMC4837021  NIHMSID: NIHMS767945  PMID: 26972838

Abstract

Electronic health records (EHR) are a vital data resource for research uses, including cohort identification, phenotyping, pharmacovigilance, and public health surveillance. To realize the promise of EHR data for accelerating clinical research, it is imperative to enable efficient and autonomous EHR data interrogation by end users such as biomedical researchers. This paper surveys state-of-art approaches and key methodological considerations to this purpose. We adapted a previously published conceptual framework for interactive information retrieval, which defines three entities: user, channel, and source, by elaborating on channels for query formulation in the context of facilitating end users to interrogate EHR data. We show the current progress in biomedical informatics mainly lies in support for query execution and information modeling, primarily due to emphases on infrastructure development for data integration and data access via self-service query tools, but has neglected user support needed during iteratively query formulation processes, which can be costly and error-prone. In contrast, the information science literature has offered elaborate theories and methods for user modeling and query formulation support. The two bodies of literature are complementary, implying opportunities for cross-disciplinary idea exchange. On this basis, we outline the directions for future informatics research to improve our understanding of user needs and requirements for facilitating autonomous interrogation of EHR data by biomedical researchers. We suggest that cross-disciplinary translational research between biomedical informatics and information science can benefit our research in facilitating efficient data access in life sciences.

Keywords: Information Storage and Retrieval, Electronic Health Records, Human Computer Interaction

Graphical abstract

graphic file with name nihms767945u1.jpg

1 INTRODUCTION

Biomedical research has long benefited from a valuable and cost-effective data source: patient health records [1]. For example, the Apgar Scale [2] and the Goldman multifactorial index of cardiac risks [3] were both derived from analyses of patient health records. With the increasingly pervasive adoption of electronic health record (EHR) worldwide [4], many have recognized the rich clinical data increasingly made available by EHRs as a promising data resource for accelerating medical knowledge discovery [5] and for enabling comparative effectiveness research [610]. Subsequently, the demand for reusing EHR data for research among biomedical researchers has been rising rapidly [1115]. Assisting biomedical researchers to interrogate EHR data has been a vital mission for the biomedical informatics research community. However, this task faces significant human and technological barriers [10, 16, 17]. Current data captured by EHRs are not optimized for secondary uses beyond clinical care or administration-centered documentation practices so that many institutions employ intermediating data analysts to retrieve EHR data for biomedical researchers, with varying degrees of assistance from self-service query tools. The use of intermediaries may not scale to large data networks such as the clinical data research networks (CDRNs) as part of the PCORnet [18] established by the Patient Centered Outcomes Research Institute [19]. For example, the heterogeneity of data representations across institutions and the complex, idiosyncratic local data collection processes that often remain “black boxes” to intermediaries are serious barriers facing users of the data contained in PCORnet. To contain cost for involved expensive operations, many institutions have to charge clinician scientists for reusing such data collected during patient care for research. Meanwhile, self-service query support is still at its early stage of development and may not support sophisticated data queries [20, 21].

By identifying and reviewing existing theories and best practices for EHR data interrogation, we aim to inform the design of next-generation EHR data interrogation aids that directly facilitate biomedical researchers to autonomously retrieve and reuse this data for clinical and translational research. Towards this goal, this paper contributes a literature review on this topic. We summarized existing approaches, identified research gaps, and recommended research priorities. Although this review focuses on EHR data, the knowledge gained may generalize to interactive end-user data interrogation for other reusable health data resources.

2 METHODS

2.1 Development of a Conceptual Framework for Interactive Data Retrieval

An information retrieval process addresses information needs using a sequence of tasks [22]. The complexity of the task sequence is dependent on the information retriever’s a priori knowledge of the information need, the information retrieval process stipulated by data owners, and the complexities of each of the tasks used to complete the information retrieval process [2327]. Many models have been developed for characterizing the information retrieval process or for investigating how information systems enable users during this process [2837]. For example, the berry-picking model [28] and the sense-making model [30] focus on how the user iteratively refines his or her information needs based on their conceptualizations of the information space. Among all existing models, only one developed by Bystrom and Jarvelin explicitly defined three entities that influence the complexities of an information retrieval process: user, channel, and source, to characterize the information retrieval process [24]. The user entity focuses on the user’s profiles, communication styles, and knowledge of data. The source concerns data representations towards optimal data retrieval efficiency. The channel masks the complexities of the source and translates user information needs to data representations. Therefore, source is the container of information and channel guides efficient navigation of the source. We adopted this conceptual framework to organize the literature for interactive EHR data retrieval.

In this paper, we survey related methods and theories in the context of EHR data retrieval for secondary use by end users who are unfamiliar with the data, such as biomedical researchers and clinician scientists. Since this paper aims to augment end users with improved query formulation, we focus primarily on effort supporting the user and the channel, while briefly describe existing efforts on the source. We co-opted the constructs of user, channel, and source combining them with the concepts of query formulation and query execution, as shown in Figure 1. For example, a researcher may want to identify an institution’s mortality rate among its patients undergoing coronary artery bypass. Query formulation transforms vague data requests (e.g., “adult patients younger than 75 years old with coronary artery bypass surgery last year”) into contextualized data requests consisting of specific EHR data elements (e.g., “patient DOB, Current Fiscal Year, billing code for coronary artery bypass billed in this current fiscal year”). The step of query execution further translates this query from contextualized data elements into executable database queries consisting of disparate data types and represented by local terminologies using The Structure Query Language (SQL).

Figure 1.

Figure 1

A conceptual framework for interactive data retrieval.

2.2 Literature Search

We adopted the post-positivist model for research [38]. We iteratively searched for related work published between 2009 and 2013 allowing us to treat all experience as data. Following this model, we searched beyond the field of biomedical informatics or clinical research informatics that were obvious to be relevant and included the literature in informatics and computer and information science. Also, we categorized all included citations by their focus on user, source, and channel so that significant amounts of qualitative information were categorized to produce quantitative information to help us draw the big picture and deduce evidence gaps.

We seeded our search with 29 articles proposed by the senior author (CW) [19, 23, 24, 28, 29, 3962]. Additionally, citation searches within these articles provided an additional 45 references [16, 22, 2527, 3034, 37, 63102]. These 74 articles served as the basis for the development of the search query. With our initial search query, we iteratively searched and reviewed the identified articles, incorporated new search keywords as they emerged, revised our search string and article inclusion/exclusion criteria iteratively according to their relevance as determined by manual review. We surveyed both the information science literature (i.e., http://dl.acm.org) and biomedical informatics literature (i.e., MEDLINE). We limited our search to the main journal citation databases, the ACM Digital Library, and Medline, for the respective fields of information science and biomedical sciences because we feel this will provide a representative sample for our topic.

Figure 2 is a flow chart that highlights the final search strings for PubMed and ACM databases and the inclusion and exclusion criteria for selecting articles for this review. The first author generated the final search string and reviewed the title and abstracts of the returned articles; articles containing any of the exclusion criteria were removed from the pool. Next, the first author iteratively reviewed and annotated the 125 included articles using the conceptual framework developed in Section 2.1. For each annotation, the first author wrote a summary and justification paragraph. After annotation, the first author reviewed the summary and justification paragraphs for each set of articles against the conceptual framework components and derived themes within these sections. For the source, the major themes identified were EHR data modeling (how data is structured and the standards used to store data elements) and warehousing (dedicated silos of data for secondary use). For the User, the major themes identified were Information Need (defining the complexity of the need) and user modeling (understanding the user attributes and the information seeking strategies used). For the channel, the major themes identified were query formulation (the process of defining an information need) and execution (the process of translating an information need into an executable database query). Table 1 organizes the articles according to our conceptual framework. As shown in Table 1, more work fell onto user modeling, human intermediaries, and reference interview in the field of information science than in the field of biomedical informatics. In the following sections, we will synthesize the major themes from each discipline and compare and contrast their ideas from difference sources.

Figure 2.

Figure 2

The search strings and article selection flowchart (*Articles could be classified into multiple categories as some content spans multiple categories).

Table 1.

The distribution of relevant topics in two bodies of literature

Biomedical Informatics Information Science

  SOURCE
 EHR Data Modeling [8, 12, 41, 46, 48, 52, 70, 71, 90, 103109]
 EHR Data Warehousing [40, 97, 110]

  USER
Information Need
 Information Need Complexity [50, 53, 72, 111113] [24, 60, 92]
User Modeling
 Information Seeking Processes [54, 59, 77, 84] [2224, 26, 36, 57, 67]
 User Cognitive Styles [17, 55] [23, 25, 2837, 63, 85, 86, 91]

  CHANNEL
Task 1: Query Formulation
 Concept Representation [17, 27, 46, 51, 62, 70, 71, 98, 105, 114119] [100]
 Characterization of Data integrity [17, 114, 120, 121] [26]
 Query Templates [47, 58, 72, 75, 80, 81] [122]
 Self-Service Query Tools [20, 41, 51, 52, 73, 82,88, 97, 101, 113, 123128] [42, 93, 129133]
 Human Intermediaries [1, 45, 134, 135] [39, 44, 61, 83, 89, 94, 95]
 Reference Interview [39, 43, 56, 6466, 68, 69, 74, 79, 87, 94, 96, 102]
Task 2: Query Execution
 Phenotyping [49, 62, 76, 78, 99, 117, 136138]

3 SOURCE

The barriers to task-based data access in life sciences fall into two categories: (1) human factors (e.g., a user lacking a correct conceptualization of task complexities); (2) system factors (e.g., technology limitations in the existing systems such as data heterogeneity and fragmentation) [17]. We presented example known barriers and corresponding recommended solutions in Table 2. Human factors relate to the user, while system factors relate to the metadata of the source, or in this instance, the lack of metadata concerning known and underlying EHR data quality issues.

Table 2.

Known themes regarding EHR data access barriers and solutions

Level Barriers Solutions
Human factors Known and latent EHR data quality issues [17, 120, 121] Transparent reporting of data limitations for intended uses [26, 114]
System factors System interoperability[108]; Real time analytics [8, 12]; Data fragmentation[108]; Data heterogeneity [121] Streamlining data access workflow[71] [52]; Data warehousing [49, 78]; Data modeling and integration strategies (i.e. The Observational Medical Outcomes Partnership Common Data Model) [46] [48, 109]

In the context of this study, we discuss data warehousing at the institutional level rather than at the state or national level, although the same principles may apply to both. Data warehousing has been a focus in the clinical research informatics community for overcoming technical barriers to data access by providing efficient access to integrated EHR data, which can be centralized or federated. The centralized infrastructure [40, 41, 52] avoids data heterogeneity but can introduce challenges for incorporating new or latent data elements over time because each update affects the entire database, hence not being able to scale easily. In contrast, the federated architecture is flexible [90, 103105] for allowing autonomous data control and growth over time. It allows data representation heterogeneity [70, 106, 107]. It also enables the leveraging of distributed local expertise for data modeling and data quality control and supports geographic distribution by multiple stakeholders. The national center for education statistics has summarized the tradeoffs between centralized and federated data repositories [139]. Briefly, centralized systems ease data governance, increase data retrieval performance, provide uniform data for efficient data mining, entail a high-cost burden for ensuring data currency and completeness, and are harder to scale for evolving data needs and different data access workflows [140]. Many institutions have centralized data repositories such as STRIDE [110]. The most widely used platform, i2b2 and its SHRINE architecture, supports distributed and federated data repository [97]. The federated architecture represents an established model for most large data networks such as PCORnet [18, 141].

4 USER

4.1 Information Need Complexity

One component of user modeling is understanding the complexity of information need, which depends on the diversities in the use context [72], variations in information seeking behaviors[24], and heterogeneity in languages used to express the information need [92]. Others have proposed a categorical scale of data need complexity by measuring the amount of work required to accomplished the task for satisfying the information need [60, 86]. Structures have been defined to characterize a complex information need, including a problem statement, an event of interest, a comparison event (if necessary), and potential effects of the event of interest [50]. Unfortunately, little is known about the data needs of biomedical researchers [111113]. The very few studies available have largely focused on identifying sets of the major data elements needed to facilitate research in particular medical domains of interest. Cimino et al. leveraged these key data elements needed by researchers to inform the development of a user-centric query tool [113]. What has been lacking includes a thorough understanding of user preferences and search behaviors, as well as communication patterns between biomedical researchers and query analysts for clarifying data needs iteratively.

4.2 User Cognitive Styles

Five user characteristics influence a user’s information seeking tactics:

  1. Phases of mental model of information –confusion, doubt, threat, hypothesis testing, assessing, and reconstructing [63]

  2. Levels of need–visceral, conscious, formal, and compromised [39]

  3. Levels of specificity–new problem, new situation, experiential needs, and well understood situation [23]

  4. Expression–questions connections, and commands gap [23, 39]

  5. Mood–invitational, and indicative [63]

Kuhlthau provides a great amalgamation of these user characteristics in her theoretical foundation of the information seeking process [67], which was well supported by Vakkari's review [57]. Additionally, the information seeking process has been modeled within the biomedical literature. Mendonca et al. [54] and Hung et al. [59] have provided models for the biomedical literature information seeking process, which propose to aid user’s search strategies through well-structured clinical queries and by leveraging the knowledge of human search experts, respectively.

User cognitive styles shape information seeking processes [36]. Many describe these styles along two orthogonal axes: analytic and descriptive. The analytic cognitive style captures an active approach to information seeking in which conceptual level questioning is used to resolve information need, whereas the descriptive cognitive style represents a passive approach, where concentration on the most detailed level of the subject matter is used to resolve an information need. User cognitive styles are either passive (high descriptive and low analytic with attention to detailed questions) or active (high analytic and low descriptive questioning). The active styles represent more effective and efficient search strategies than passive styles [37].

A user’s domain knowledge and technical knowledge are both associated with their cognitive styles and effective search strategies [17, 50, 91]. The users’ cognitive styles can be differentiated [85, 86] by search tactics. Users of varying cognitive styles have different sense making strategies or processes. Studies of cognitive styles offer a more generalizable mechanism to stratify users and to predict individual information seeking styles. Cognitive science allows characterization of the demand characteristics of a particular task and to focus on the appropriate problem dimensions [55]. Although user cognition during EHR data interrogation is rarely studied in the biomedical informatics literature, especially for facilitating EHR data interrogation, such studies are much needed. Cognitive studies of users can enable user centered EHR data interrogation designs aiming to improve the user experience.

5 CHANNEL

The complexity of a data source is multidimensional, including heterogeneous semantic representations [46, 70, 71], opaque data integrity, complex time expressions [98, 100], and fragmented knowledge of logical data constructs [17, 27, 114]. A channel enables users to navigate data despite complexities by providing users with an abstract mechanism to interact with data sources during query formulation or query execution.

5.1 Query Formulation

The query formulation component facilitates the iterative interaction between the user and the source to formulate a query in a user’s language. Hripcsak et al. investigated two EHR data retrieval channels, AccessMed and Query by Review, and found neither achieved adequate performance, indicating the difficulty of query formulation for EHR data [51]. Next we review common aids for query formulation and related execution challenges: i.e., self-serving query tools, query templates, as well as human intermediaries.

5.1.1 Human intermediaries

Human intermediaries are often employed to formulate user queries to ensure feasibility and precision [44, 45, 83, 89, 134]. Intermediaries usually have received formal training and possess a deep understanding of work culture and technical skills for data querying [1, 61, 95, 135]. The biomedical literature provides scant knowledge regarding how human intermediaries operate in the biomedical information rich domain. Information science has extensively studied human intermediaries, most notably, librarians and their development of the reference interview technique [39, 94]. Reference interviews elicit tacit user needs, specify vague queries, narrow overly broad questions, and suggest further dimensions of the information need that the user may not have expressed but are logically related to the user-stated objective. It enables a skillful interrogation process widely adopted by librarians for converting vague and general data request provided by users into specific data queries expressed using user language [39, 56, 65, 66, 68, 69, 87, 94, 96].

Elicitation strategies used in reference interviews have been explored to improve negotiation of information needs [43, 64, 74, 79]. Specifically, interrogation strategies are primarily developed to obtain the user’s objective surrounding the information need. When users are aware of the reference interview’s purpose, they are willing to provide additional information on objective and intent. In a related study, Lin et al. analyzed need negotiations and extracted a taxonomy of clarification questions that fit within a set of six classes [102]. The taxonomy applied in the context of EHR data interrogation is illustrated in Table 3, with example clarification questions. Our results imply that in the context of interactive EHR data retrieval, the reference interview may provide human intermediaries with a more efficient workflow to best extract a non-vague description of the data need from the user.

Table 3.

A Taxonomy of Clarification Questions Utilized During Need Negotiation

Question Type Goal EHR Data Clarification Examples

Relevance threshold To establish the user’s relevance threshold, a mapping of a continuous scale into a binary decision What is the A1c threshold for the diabetes patients that you are looking for?
Ambiguity in conjoined facets To establish the relationship type between conjoined facets Do you only want patients that have both a diagnosis of diabetes and hypertension or patients with a diabetes diagnosis with or without a hypertension diagnosis?
Example concept To ensure an identified facet is relevant to the user’s information need Would patients without Diabetes diagnoses, but taking a medication for diabetes be of interest to you?
Closely related or subset concept To expand or narrow the scope of the user’s information need on the specificity of a particular facet Do you want patients with both types of Diabetes? Type 1 and 2?
Related topical aspects To establish the user’s interest in facets that are conceptually related, but not directly requested Does it matter what treatment protocol the diabetic patients are on?
Acceptability of summaries To define both the user’s expectation of the information retrieved Do you just want to know how many patients fit your criteria?

5.1.2 Query Templates

Templates are another effective technique for expressing standards-based structured data needs free of ambiguity and vagueness [77, 84]. A query template provides an organizational structure for the user to describe their information need in a non-vague structure [80]. Templates have been developed to access clinical data [47] and medical literature [58, 72, 75, 80]. The Patient, Intervention, Control/Comparison, and Outcome (PICO) framework is extensively used to explore the medical literature for relevant resources [81, 122]. Currently, there is no well-accepted standard template based on community consensus. Instead, many medical institutions require users to complete data requests using free text.

5.1.3 Self-service query tools

Since human intermediaries are expensive and time-consuming, self-service query aids have been pursued in many institutions in recent years [41, 51, 73, 97, 101, 113, 123, 125, 127, 128]. Some are form-based, while others support queries in natural language [42, 51]. Visual query formulation is a recent trend and is expected to reduce user cognitive load by presenting information intuitively to the user [129]. The Informatics for Integrating Biology and the Bedside (i2b2) project represents the most widely adopted self-service EHR data retrieval system. The system’s terminology explorer and query builder allow the user to search the terminology for applicable terms and build cohorts using a frame system with Boolean constraints [52, 82, 88, 97, 113, 123128]. Deshmukh et al. studied the various types of data requests these applications were able to resolve. The study suggested that i2b2 facilitated relatively simple, cohort identification queries. They also acknowledged that the majority of requests they studied were “simple” queries [20]. These reports indicate that the majority of self-service tools for EHR data support a limited scope of data specification. Additionally, these tools place the burden to identify the correct terms associated with a particular medical concept on the user.. Many complex data request require not just simple constraints, e.g. “all patients diagnosed with diabetes between May and July of 2012”, but complex relations between data elements, e.g. “all patients with their first recorded diagnosis of diabetes between May and July of 2012, and all lab glucose tests after their diagnosis and before the start of treatment.” For these complex temporal queries, temporal query tools can be used to visualize raw data or concepts over absolute and relative temporal timelines [93, 101, 130133]. Though experimental, these tools offer a solution to a complex problem of temporal specification and visualization of EHR data. Meanwhile, significant EHR data processing and transforming are needed for these systems to work appropriately.

5.2 Query Execution

The query execution component focuses on the conversion of a user query into an executable database query by mapping medical concepts specified by the user to the EHR data elements that define that concept. The Electronic Medical Records and Genomics (eMERGE) consortium [117] has studied the problem of phenotyping disease concepts through enumerable data elements within the EHR. The EMERGE consortium has shown that each disease phenotype contains significant heterogeneity, underlying elements representing nested Boolean logic, complex temporality and ubiquitous ICD-9 codes [117, 136]. As of 2013, the group has validated 13 phenotypes [62]. Although the temporal nature of EHR data was considered only in some of the eMERGE phenotypes, temporal abstraction is an important technique for EHR phenotyping. Post et al. have established the PROTEMP method, which allows for the abstraction of temporal data events [99]. Additionally, Shahar’s framework on temporal abstraction has described promising methods for formally representing temporal patterns [76, 137, 138].

6 DISCUSSION

Interactive EHR data retrieval involves complex interactions among users, sources, and channels. The healthcare industry has heavily invested in infrastructures for data integration. To maximize the return on investment and to use these resources to advance medicine, we need to make such data accessible to biomedical researchers for various computing needs.

Self-service query tools have not fully addressed user needs and hence make human intermediaries indispensable in many institutions. These intermediaries utilize a needs- negotiation process with the user. Barriers facing this process include the lack of medical and technical knowledge by the intermediary and the user, respectively. Bridging these knowledge gaps for the intermediary and user may engender efficient communication. Furthermore, a user often presents a vague understanding and description of their information need. Intermediaries may benefit from a standardized structure through which requests can be organized, which may reduce the ambiguity of the request and allow the intermediary to focus on other tasks, i.e. query execution.

Relatively little is known regarding biomedical researcher’s cognitive styles and information seeking strategies. Table 4 lists key knowledge gaps and potential recommendations for bridging those gaps. Additional exploratory studies are needed to bridge the knowledge gaps concerning how biomedical researchers interrogate EHR data and what their barriers are. We can invest more effort on interactive information retrieval that augments not only the source but also the user.

Table 4.

The knowledge gaps and recommendations for advancing EHR interrogation

Aspects Knowledge Gaps Recommendations
User
  • Lack of measure of information need complexity

  • Lack of knowledge of how the cognitive styles of medical researchers affect the information seeking process

  • Develop metrics for measuring EHR information need complexities

  • Conduct qualitative studies of the information seeking processes of multidisciplinary medical researchers and their barriers to clinical data access

Query Formulation Channel
  • Lack of formalized structure for the medical researcher to express their information need

  • Poor understanding of the data need- negotiation process performed by data intermediaries

  • Investigate other formal structures used for document retrieval, i.e. PICO framework

  • Support reference interview for EHR data interrogation

Information-seeking models explain the sub-optimal outcomes resulting from current methods used for EHR data interrogation. The granularity of data that biomedical researchers are seeking adds more complexity to existing information seeking models. Additional understanding of process-oriented EHR data access by biomedical researchers is needed. To this end, we extracted from the literature three promising concepts to aid in the construction of an ideal process-oriented EHR data interrogation model.

First, both information science and biomedical informatics have established the important role semantics play in optimal information retrieval. For example, the PICO framework is an excellent user aid, which helps organize and express the information need of clinicians. In the context of EHR data interrogation, the PICO framework can be potentially a good starting point for supporting the expression of biomedical researchers’ EHR data need.

Second, the complexity of information need shapes information search tactics. A metric for complexity assessment of the information need can optimize resource allocation while resolving the user’s information need. For example, complex data requests would be directed to an informatician, whereas simple request would be facilitated through existing self-service query tools. A standardized method for EHR data need complexity assessment can further enable global resource optimization.

Third, the established reference interview has effectively helped librarians to clarify user needs. Informaticians provide a similar role as librarians but lack a guideline on how to conduct reference interview for EHR data. Experience is the only way for informaticians to gain insights and expertise for this task. To increase the efficiency and effectiveness of the EHR data needs negotiation, an EHR-based reference interview, conducted by an informatician, may aid the query formulation process in the translation of vague EHR data requests into specific data queries. More studies are needed in this area to enable reference interview for EHR data.

Our study has two major limitations. First, we developed our search criteria based on preselected topics. This method may be biased towards self-selected topics and leave out topics relevant to this review but not searchable by the query derived from the pre-selected topics. Nevertheless, we used an established method to identify a focused topic set and believe this review is representative of what is available in the literature. Second, our focus on the most recent four years of literature may have excluded seminal articles in the field from the past; however, we believe our exhaustive citation search should have largely mitigated this problem.

7 CONCLUSIONS

This review surveys the methodological considerations for interactive EHR data interrogation. We have identified knowledge gaps and research opportunities for advancing EHR data interrogation. Our results show that support for reference interview for EHR data is a promising direction for improving user autonomy for biomedical researchers during EHR data interrogation. More user understanding is needed to enable such support cost-effectively. We suggest that cross-disciplinary translational research between biomedical informatics and information science is needed to apply theories and techniques from information science to facilitate efficient end user data interrogation in life sciences.

  • There is no standard complexity measure to assess EHR data requests

  • How a researcher’s cognitive style affects EHR data seeking is unknown

  • No formalized structure guiding researchers to express their EHR data need

  • The need-negotiation process performed informaticians is poorly understood

Acknowledgments

This research was funded under NLM grant R01LM009886 (PI: Chunhua Weng), and 5T15LM007079 (PI: George Hripcsak), and CTSA award UL1 TR000040 (PI: Henry Ginsberg). Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NIH.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Hruby GW, McKiernan J, Bakken S, Weng C. A centralized research data repository enhances retrospective outcomes research capacity: a case report. J Am Med Inform Assoc. 2013;20(3):563–567. doi: 10.1136/amiajnl-2012-001302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Apgar V. A proposal for a new method of evaluation of the newborn infant. Curr Res Anesth Analg. 1953;32(4):260–7. [PubMed] [Google Scholar]
  • 3.Goldman L, Caldera DL, Nussbaum SR, Southwick FS, Krogstad D, Murray B, Burke DS, O'Malley TA, Goroll AH, Caplan CH, Nolan J, Carabello B, Slater EE. Multifactorial index of cardiac risk in noncardiac surgical procedures. N Engl J Med. 1977;297(16):845–50. doi: 10.1056/NEJM197710202971601. [DOI] [PubMed] [Google Scholar]
  • 4.Adler-Milstein J, DesRoches CM, Kralovec P, Foster G, Worzala C, Charles D, Searcy T, Jha AK. Electronic Health Record Adoption In US Hospitals: Progress Continues, But Challenges Persist. Health Aff (Millwood) 2015;34(12):2174–80. doi: 10.1377/hlthaff.2015.0992. [DOI] [PubMed] [Google Scholar]
  • 5.Borycki E, Newsham D, Bates D. eHealth in North America. Yearbook of medical informatics. 2012;8(1):3. [PubMed] [Google Scholar]
  • 6.D'Avolio LW, Farwell WR, Fiore LD. Comparative effectiveness research and medical informatics. The American Journal of Medicine. 2010;123(12):e32–e37. doi: 10.1016/j.amjmed.2010.10.006. [DOI] [PubMed] [Google Scholar]
  • 7.Holve E, Segal C, Lopez MH, Rein A, Johnson BH. The Electronic Data Methods (EDM) forum for comparative effectiveness research (CER) Med Care. 2012;50(Suppl):S7–10. doi: 10.1097/MLR.0b013e318257a66b. [DOI] [PubMed] [Google Scholar]
  • 8.Miriovsky BJ, Shulman LN, Abernethy AP. Importance of health information technology, electronic health records, and continuously aggregating data to comparative effectiveness research and learning health care. J Clin Oncol. 2012;30(34):4243–8. doi: 10.1200/JCO.2012.42.8011. [DOI] [PubMed] [Google Scholar]
  • 9.Hoffman S, Podgurski A. Big Bad Data: Law, Public Health, and Biomedical Databases. Public Health, and Biomedical Databases. Journal of Law, Medicine and Ethics. 2012 Oct 30; doi: 10.1111/jlme.12040. Forthcoming, 2012. [DOI] [PubMed] [Google Scholar]
  • 10.Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne P, Bernstam EV, Lehmann HP, Hripcsak G, Hartzog TH, Cimino JJ. Caveats for the Use of Operational Electronic Health Record Data in Comparative Effectiveness Research. Med Care. 2013 doi: 10.1097/MLR.0b013e31829b1dbd. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Blumenthal D, Tavenner M. The "meaningful use" regulation for electronic health records. N Engl J Med. 2010;363(6):501–4. doi: 10.1056/NEJMp1006114. [DOI] [PubMed] [Google Scholar]
  • 12.Wade TD, Hum RC, Murphy JR. A Dimensional Bus model for integrating clinical and research data. Journal of the American Medical Informatics Association. 2011;18(Suppl 1):i96–i102. doi: 10.1136/amiajnl-2011-000339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Holve E, Segal C, Hamilton Lopez M. Opportunities and challenges for comparative effectiveness research (CER) with Electronic Clinical Data: a perspective from the EDM forum. Med Care. 2012;50(Suppl):S11–8. doi: 10.1097/MLR.0b013e318258530f. [DOI] [PubMed] [Google Scholar]
  • 14.Rein A. Finding Value in Volume: An Exploration of Data Access and Quality Challenges. AcademyHealth: Briefs and Reports. 2012:9. [Google Scholar]
  • 15.Yip Y. Unlocking the potential of electronic health records for translational research. Findings from the section on bioinformatics and translational informatics. Yearbook of medical informatics. 2012;7(1):135. [PubMed] [Google Scholar]
  • 16.Arzberger P, Schroeder P, Beaulieu A, Bowker G, Casey K, Laaksonen L, Moorman D, Uhlir P, Wouters P. Promoting access to public research data for scientific, economic, and social development. Data Science Journal. 2004;3(0):135–152. [Google Scholar]
  • 17.Kumpulainen S, Järvelin K. Barriers to task-based information access in molecular medicine. Journal of the American Society for Information Science and Technology. 2012;63(1):86–97. [Google Scholar]
  • 18.Collins FS, Hudson KL, Briggs JP, Lauer MS. PCORnet: turning a dream into reality. J Am Med Inform Assoc. 2014;21(4):576–7. doi: 10.1136/amiajnl-2014-002864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Selby JV, Beal AC, Frank L. The Patient-Centered Outcomes Research Institute (PCORI) national priorities for research and initial research agenda. JAMA. 2012;307(15):1583–1584. doi: 10.1001/jama.2012.500. [DOI] [PubMed] [Google Scholar]
  • 20.Deshmukh V, Meystre S, Mitchell J. Evaluating the informatics for integrating biology and the bedside system for clinical research. BMC medical research methodology. 2009;9(1):70. doi: 10.1186/1471-2288-9-70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Meystre SM, Deshmukh VG, Mitchell J. A Clinical Use Case to Evaluate the i2b2 Hive: Predicting Asthma Exacerbations. AMIA Annu Symp Proc. 2009:442–446. [PMC free article] [PubMed] [Google Scholar]
  • 22.Spink A, Wilson T. Toward a Theoretical Framework for Information Retrieval (IR) Evaluation in an Information Seeking Context. Mira. 1999 [Google Scholar]
  • 23.Belkin NJ, Oddy RN, Brooks HM. ASK for information retrieval: Part I. Background and theory. Journal of documentation. 1982;38(2):61–71. [Google Scholar]
  • 24.Byström K, Järvelin K. Task complexity affects information seeking and use. Information Processing & Management. 1995;31(2):191–213. [Google Scholar]
  • 25.Borgman CL. Why are online catalogs still hard to use? JASIS. 1996;47(7):493–503. [Google Scholar]
  • 26.Vakkari P. Task complexity, problem structure and information actions: integrating studies on information seeking and retrieval. Information Processing & Management. 1999;35(6):819–837. [Google Scholar]
  • 27.Wang D, Kaufman DR, Mendonca EA, Seol YH, Johnson SB, Cimino JJ. The cognitive demands of an innovative query user interface. Proc AMIA Symp. 2002:850–4. [PMC free article] [PubMed] [Google Scholar]
  • 28.Bates MJ. The design of browsing and berrypicking techniques for the online search interface. Online Information Review. 1989;13(5):407–424. [Google Scholar]
  • 29.Dervin B. From the mind's eye of the user: the sense-making qualitative-quantitative methodology. Qualitative research in information management. 1992;9:61–84. [Google Scholar]
  • 30.Dervin B. Sense-making theory and practice: an overview of user interests in knowledge seeking and use. Journal of knowledge management. 1998;2(2):36–46. [Google Scholar]
  • 31.Feltovich PJ, Hoffman RR, Woods D, Roesler A. Keeping it too simple: How the reductive tendency affects cognitive engineering. Intelligent Systems, IEEE. 2004;19(3):90–94. [Google Scholar]
  • 32.Klein G, Moon B, Hoffman RR. Making sense of sensemaking 2: A macrocognitive model. Intelligent Systems, IEEE. 2006;21(5):88–92. [Google Scholar]
  • 33.Klein G, Moon B, Hoffman RR. Making sense of sensemaking 1: Alternative perspectives. Intelligent Systems, IEEE. 2006;21(4):70–73. [Google Scholar]
  • 34.Klein G, Phillips JK, Rall EL, Peluso DA. A data-frame theory of sensemaking. Expertise out of context. 2007:113–155. [Google Scholar]
  • 35.Olsson MR. Re-thinking our concept of users. Australian Academic & Research Libraries. 2009;40(1):22–35. [Google Scholar]
  • 36.Blandford A, Attfield S. Interacting with information. Synthesis Lectures on Human-Centered Informatics. 2010;3(1):1–99. [Google Scholar]
  • 37.Ford N, Ford R. Towards a cognitive theory of information accessing: an empirical study. Information Processing & Management. 1993;29(5):569–585. [Google Scholar]
  • 38.Wildemuth BM. Post-positivist research: two examples of methodological pluralism. The Library Quarterly. 1993:450–468. [Google Scholar]
  • 39.Taylor RS. QUESTION-NEGOTIATION AN INFORMATION-SEEKING IN LIBRARIES. 1967 DTIC Document. [Google Scholar]
  • 40.Warner HR, Morgan JD. High-density medical data management by computer. Computers and Biomedical Research. 1970;3(5):464–476. doi: 10.1016/0010-4809(70)90008-x. [DOI] [PubMed] [Google Scholar]
  • 41.Warner HR. Proceedings of the Annual Symposium on Computer Application in Medical Care. American Medical Informatics Association; 1978. Knowledge sectors for logical processing of patient data in the HELP system; p. 401. [Google Scholar]
  • 42.Jarke M, Tuner J, Stohr EA, Vassiliou Y, White NH, Michielsen K. A field evaluation of natural language for data retrieval. Software Engineering, IEEE Transactions on. 1985;(1):97–114. [Google Scholar]
  • 43.Dervin B, Dewdney P. Neutral questioning: A new approach to the reference interview. RQ. 1986:506–513. [Google Scholar]
  • 44.Borgman CL, Belkin NJ, Croft WB, Lesk ME, Landauer TK. Proceedings of the SIGCHI conference on Human factors in computing systems. ACM; 1988. Retrieval systems for the information seeker: can the role of the intermediary be automated? pp. 51–53. [Google Scholar]
  • 45.Merz RB, Cimino C, Barnett GO, Blewett DR, Gnassi JA, Grundmeier R, Hassan L. Q & A: a query formulation assistant. Proc Annu Symp Comput Appl Med Care. 1992:498–502. [PMC free article] [PubMed] [Google Scholar]
  • 46.Schoening PA, Abrams CA, Kahn MG. Proceedings of the Annual Symposium on Computer Application in Medical Care. American Medical Informatics Association; 1993. An object model for uniform access to heterogeneous databases; p. 502. [PMC free article] [PubMed] [Google Scholar]
  • 47.Cimino JJ, Aguirre A, Johnson SB, Peng P. Generic queries for meeting clinical information needs. Bull Med Libr Assoc. 1993;81(2):195–206. [PMC free article] [PubMed] [Google Scholar]
  • 48.Lindberg DA, Humphreys BL, McCray AT. The Unified Medical Language System. Methods of information in medicine. 1993;32(4):281–291. doi: 10.1055/s-0038-1634945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Johnson SB, Hripcsak G, Chen J, Clayton P. Accessing the Columbia Clinical Repository. Proc Annu Symp Comput Appl Med Care. 1994:281–5. [PMC free article] [PubMed] [Google Scholar]
  • 50.Richardson WS, Murphy AL. Ask, and ye shall retrieve. Evidence Based Medicine. 1998;3(4):100–101. [Google Scholar]
  • 51.Hripcsak G, Allen B, Cimino JJ, Lee R. Access to data: comparing AccessMed with Query by Review. J Am Med Inform Assoc. 1996;3(4):288–99. doi: 10.1136/jamia.1996.96413137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Murphy SN, Morgan MM, Barnett GO, Chueh HC. Proceedings of the AMIA Symposium. American Medical Informatics Association; 1999. Optimizing healthcare research data warehouse design through past COSTAR query analysis; p. 892. [PMC free article] [PubMed] [Google Scholar]
  • 53.Mendonça EA, Cimino JJ. Building a knowledge base to support a digital library. Studies in health technology and informatics. 2001;(1):221–225. [PubMed] [Google Scholar]
  • 54.Mendonça EA, Cimino JJ, Johnson SB, Seol Y-H. Accessing heterogeneous sources of evidence to answer clinical questions. Journal of biomedical informatics. 2001;34(2):85–98. doi: 10.1006/jbin.2001.1012. [DOI] [PubMed] [Google Scholar]
  • 55.Patel VL, Arocha JF, Kaufman DR. A primer on aspects of cognition for medical informatics. Journal of the American Medical Informatics Association. 2001;8(4):324–343. doi: 10.1136/jamia.2001.0080324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Wu MM, Liu YH. Intermediary's information seeking, inquiring minds, and elicitation styles. Journal of the American Society for Information Science and Technology. 2003;54(12):1117–1133. [Google Scholar]
  • 57.Vakkari P. Task-based information searching. Annual review of information science and technology. 2005;37(1):413–464. [Google Scholar]
  • 58.Schardt C, Adams MB, Owens T, Keitz S, Fontelo P. Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC medical informatics and decision making. 2007;7(1):16. doi: 10.1186/1472-6947-7-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Hung PW, Johnson SB, Kaufman DR, Mendonça EA. A multi-level model of information seeking in the clinical domain. Journal of biomedical informatics. 2008;41(2):357–370. doi: 10.1016/j.jbi.2007.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Li Y, Belkin NJ. A faceted approach to conceptualizing tasks in information seeking. Information Processing & Management. 2008;44(6):1822–1837. [Google Scholar]
  • 61.Rankin JA, Grefsheim SF, Canto CC. The emerging informationist specialty: a systematic review of the literature. Journal of the Medical Library Association: JMLA. 2008;96(3):194. doi: 10.3163/1536-5050.96.3.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V, Basford M, Chute CG, Kullo IJ, Li R. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. Journal of the American Medical Informatics Association. 2013;20(e1):e147–e154. doi: 10.1136/amiajnl-2012-000896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Kelly GA. A theory of personality: The psychology of personal constructs. New York: Norton; 1963. [Google Scholar]
  • 64.King GB. Open & Closed Questions: The Reference Interview. RQ. 1972;12(2):157–160. [Google Scholar]
  • 65.Knapp SD. The reference interview in the computer-based setting. RQ. 1978;17(4):320–324. [Google Scholar]
  • 66.Lynch MJ. Reference interviews in public libraries. The Library Quarterly. 1978:119–142. [Google Scholar]
  • 67.Kuhlthau CC. Inside the search process: Information seeking from the user's perspective. JASIS. 1991;42(5):361–371. [Google Scholar]
  • 68.White MD. The dimensions of the reference interview. RQ. 1981:373–381. [Google Scholar]
  • 69.White MD. Evaluation of the reference interview. RQ. 1985:76–84. [Google Scholar]
  • 70.Kahn MG. The desktop database dilemma. Academic Medicine. 1993;68(1):34–37. doi: 10.1097/00001888-199301000-00006. [DOI] [PubMed] [Google Scholar]
  • 71.Kahn MG. Clinical databases and critical care research. Critical care clinics. 1994;10(1):37. [PubMed] [Google Scholar]
  • 72.Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3. [PubMed] [Google Scholar]
  • 73.Steib S, Reichley R, McMullin S, Marrs K, Bailey TC, Dunagan WC, Kahn M. Proceedings of the Annual Symposium on Computer Application in Medical Care. American Medical Informatics Association; 1995. Supporting ad-hoc queries in an integrated clinical database; p. 62. [PMC free article] [PubMed] [Google Scholar]
  • 74.Dewdney P, Michell G. Asking" why" questions in the reference interview: A theoretical justification. The Library Quarterly. 1997:50–71. [Google Scholar]
  • 75.Counsell C. Formulating questions and locating primary studies for inclusion in systematic reviews. Annals of Internal Medicine. 1997;127(5):380–387. doi: 10.7326/0003-4819-127-5-199709010-00008. [DOI] [PubMed] [Google Scholar]
  • 76.Shahar Y. A framework for knowledge-based temporal abstraction. Artificial intelligence. 1997;90(1):79–133. doi: 10.1016/0933-3657(95)00036-4. [DOI] [PubMed] [Google Scholar]
  • 77.Snowball R. Using the clinical question to teach search strategy: fostering transferable conceptual skills in user education by active learning. Health Libraries Review. 1997;14(3):167–172. [Google Scholar]
  • 78.Johnson SB, Chatziantoniou D. Proceedings of the AMIA Symposium. American Medical Informatics Association; 1999. Extended SQL for manipulating clinical warehouse data; p. 819. [PMC free article] [PubMed] [Google Scholar]
  • 79.Nordlie R. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM; 1999. “User revealment”—a comparison of initial queries and ensuing question development in online searching and in human reference interactions; pp. 11–18. [Google Scholar]
  • 80.Booth A, O'Rourke AJ, Ford NJ. Structuring the pre-search reference interview: a useful technique for handling clinical questions. Bull Med Libr Assoc. 2000;88(3):239. [PMC free article] [PubMed] [Google Scholar]
  • 81.Villanueva EV, Burrows EA, Fennessy PA, Rajendran M, Anderson JN. Improving question formulation for use in evidence appraisal in a tertiary care setting: a randomised controlled trial [ISRCTN66375463] BMC medical informatics and decision making. 2001;1(1):4. doi: 10.1186/1472-6947-1-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Murphy SN, Barnett GO, Chueh HC. Proceedings of the AMIA Symposium. American Medical Informatics Association; 2000. Visual query tool for finding patient cohorts from a clinical data warehouse of the partners HealthCare system; p. 1174. [PMC free article] [PubMed] [Google Scholar]
  • 83.Robins D. Shifts of focus on various aspects of user information problems during interactive information retrieval. Journal of the American Society for Information Science. 2000;51(10):913–928. [Google Scholar]
  • 84.Ely JW, Osheroff JA, Ebell MH, Chambliss ML, Vinson DC, Stevermer JJ, Pifer EA. Obstacles to answering doctors' questions about patient care with evidence: qualitative study. BMJ: British Medical Journal. 2002;324(7339):710. doi: 10.1136/bmj.324.7339.710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Ford N, Wilson T, Foster A, Ellis D, Spink A. Information seeking and mediated searching. Part 4. Cognitive styles in information seeking. Journal of the American Society for Information Science and Technology. 2002;53(9):728–735. [Google Scholar]
  • 86.Spink A, Wilson TD, Ford N, Foster A, Ellis D. Information-seeking and mediated searching. Part 1. Theoretical framework and research design. Journal of the American Society for Information Science and Technology. 2002;53(9):695–703. [Google Scholar]
  • 87.Janes J. Question Negotiation in an Electronic Age. The Digital Reference Research Agenda. 2003:48–60. [Google Scholar]
  • 88.Murphy SN, Gainer V, Chueh HC. AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2003. A visual interface designed for novice users to find research patient cohorts in a large biomedical database; p. 489. [PMC free article] [PubMed] [Google Scholar]
  • 89.Small S, Shimizu N, Strzalkowski T, Liu T. HITIQA: A Data Driven Approach to Interactive Question Answering: A Preliminary Report. New Directions in Question Answering. 2003:94–104. [Google Scholar]
  • 90.Wilcox AB, Hripcsak G. The role of domain knowledge in automating medical text report classification. J Am Med Inform Assoc. 2003;10(4):330–8. doi: 10.1197/jamia.M1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Wildemuth BM. The effects of domain knowledge on search tactic formulation. Journal of the American Society for Information Science and Technology. 2003;55(3):246–258. [Google Scholar]
  • 92.Diekema AR, Yilmazel O, Chen J, Harwell S, He L, Liddy ED. Finding answers to complex questions. 2004. [Google Scholar]
  • 93.Goren-Bar D, Shahar Y, Galperin-Aizenberg M, Boaz D, Tahan G. Proceedings of the working conference on Advanced visual interfaces. ACM; 2004. KNAVE II: the definition and implementation of an intelligent tool for visualization and exploration of time-oriented clinical data; pp. 171–174. [Google Scholar]
  • 94.Lankes RD. The Digital Reference Research Agenda. Journal of the American Society for Information Science & Technology. 2004;55(4):301–311. [Google Scholar]
  • 95.Hansen P, Järvelin K. Collaborative information retrieval in an information-intensive domain. Information Processing & Management. 2005;41(5):1101–1119. [Google Scholar]
  • 96.McCracken NJ, Diekema AR, Ingersoll G, Harwell SC, Allen EE, Yilmazel O, Liddy ED. Proceedings of the Interactive Question Answering Workshop at HLT-NAACL 2006. Association for Computational Linguistics; 2006. Modeling reference interviews as a basis for improving automatic QA systems; pp. 17–24. [Google Scholar]
  • 97.Murphy SN, Mendis ME, Berkowitz DA, Kohane I, Chueh HC. Integration of clinical and genetic data in the i2b2 architecture. AMIA Annu Symp Proc. 2006:1040. [PMC free article] [PubMed] [Google Scholar]
  • 98.Post AR, Sovarel AN, Harrison JH., Jr . AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2007. Abstraction-based temporal data retrieval for a Clinical Data Repository; p. 603. [PMC free article] [PubMed] [Google Scholar]
  • 99.Post AR, Harrison JH. Protempa: A method for specifying and identifying temporal sequences in retrospective data for patient selection. Journal of the American Medical Informatics Association. 2007;14(5):674–683. doi: 10.1197/jamia.M2275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Wang TD, Plaisant C, Quinn AJ, Stanchak R, Murphy S, Shneiderman B. Proceedings of the SIGCHI conference on Human factors in computing systems. ACM; 2008. Aligning temporal data by sentinel events: discovering patterns in electronic health records; pp. 457–466. [Google Scholar]
  • 101.Plaisant C, Lam S, Shneiderman B, Smith MS, Roseman D, Marchand G, Gillam M, Feied C, Handler J, Rappaport H. Searching electronic health records for temporal patterns in patient histories: a case study with microsoft amalga. AMIA Annu Symp Proc. 2008:601–5. [PMC free article] [PubMed] [Google Scholar]
  • 102.Lin J, Wu P, Abels E. Toward automatic facet analysis and need negotiation: Lessons from mediated search. ACM Transactions on Information Systems (TOIS) 2008;27(1):6. [Google Scholar]
  • 103.Kahn MG, Batson D, Schilling LM. Data model considerations for clinical effectiveness researchers. Med Care. 2012;50:S60–S67. doi: 10.1097/MLR.0b013e318259bff4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Kahn MG, Schilling LM, Kwan BM, Bunting A, Uhrich C, Singleton C. Healthcare Informatics, Imaging and Systems Biology (HISB), 2012 IEEE Second International Conference on. IEEE; 2012. Preparing Electronic Health Records Data for Comparative Effectiveness Studies; pp. 2–2. [Google Scholar]
  • 105.Tao C, Jiang G, Oniki TA, Freimuth RR, Zhu Q, Sharma D, Pathak J, Huff SM, Chute CG. A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data. Journal of the American Medical Informatics Association. 2013;20(3):554–562. doi: 10.1136/amiajnl-2012-001326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Chute CG. Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on. IEEE; 2012. (1) Obstacles and options for big-data applications in biomedicine: The role of standards and normalizations; p. 1-1. [Google Scholar]
  • 107.Zhao L, Keung SNLC, Taweel A, Tyler E, Ogunsina I, Rossiter J, Delaney BC, Peterson KA, Hobbs FR, Arvanitis TN. A Loosely Coupled Framework for Terminology Controlled Distributed EHR Search for Patient Cohort Identification in Clinical Research. Studies in health technology and informatics. 2012;180:519. [PubMed] [Google Scholar]
  • 108.Rea S, Pathak J, Savova G, Oniki TA, Westberg L, Beebe CE, Tao C, Parker CG, Haug PJ, Huff SM. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project. Journal of biomedical informatics. 2012;45(4):763–771. doi: 10.1016/j.jbi.2012.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Stang PE, Ryan PB, Racoosin JA, Overhage JM, Hartzema AG, Reich C, Welebob E, Scarnecchia T, Woodcock J. Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann Intern Med. 2010;153(9):600–6. doi: 10.7326/0003-4819-153-9-201011020-00010. [DOI] [PubMed] [Google Scholar]
  • 110.Lowe HJ, Ferris TA, Hernandez PM, Weber SC. STRIDE--An integrated standards- based translational research informatics platform. AMIA Annu Symp Proc. 2009;2009:391–5. [PMC free article] [PubMed] [Google Scholar]
  • 111.Dowdy D, Dye C, Cohen T. Data needs for evidence-based decisions: a tuberculosis modeler's wish list [Review article] The International Journal of Tuberculosis and Lung Disease. 2013;17(7):866–877. doi: 10.5588/ijtld.12.0573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Carpenter WR, Meyer A-M, Abernethy AP, Stürmer T, Kosorok MR. A framework for understanding cancer comparative effectiveness research data needs. Journal of Clinical Epidemiology. 2012;65(11):1150–1158. doi: 10.1016/j.jclinepi.2012.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Cimino JJ, Ayres EJ, Beri A, Freedman R, Oberholtzer E, Rath S. Developing a Self-Service Query Interface for Re-Using De-Identified Electronic Health Record Data. Studies in health technology and informatics. 2013;192:632. [PMC free article] [PubMed] [Google Scholar]
  • 114.Edinger T, Cohen AM, Bedrick S, Ambert K, Hersh W. AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2012. Barriers to retrieving patient information from electronic health record data: failure analysis from the TREC Medical Records Track; p. 18. [PMC free article] [PubMed] [Google Scholar]
  • 115.Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB. Bias associated with mining electronic health records. Journal of biomedical discovery and collaboration. 2011;6:48. doi: 10.5210/disco.v6i0.3581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Wilcox AB, Vawdrey DK, Chen Y-H, Forman B, Hripcsak G. AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2009. The evolving use of a clinical data repository: facilitating data access within an electronic medical record; p. 701. [PMC free article] [PubMed] [Google Scholar]
  • 117.Kho AN, Pacheco JA, Peissig PL, Rasmussen L, Newton KM, Weston N, Crane PK, Pathak J, Chute CG, Bielinski SJ, Kullo IJ, Li R, Manolio TA, Chisholm RL, Denny JC. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med. 2011;3(79):79re1. doi: 10.1126/scitranslmed.3001807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Price RC, Huth D, Smith J, Harper S, Pace W, Pulver G, Kahn MG, Schilling LM, Facelli JC. Federated Queries for Comparative Effectiveness Research: Performance Analysis. Studies in health technology and informatics. 2012;175:9. [PubMed] [Google Scholar]
  • 119.Sittig DF, Hazlehurst BL, Brown J, Murphy S, Rosenman M, Tarczy-Hornoch P, Wilcox AB. A survey of informatics platforms that enable distributed comparative effectiveness research using multi-institutional heterogenous clinical data. Med Care. 2012;50:S49–S59. doi: 10.1097/MLR.0b013e318259c02b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Bayley KB, Belnap T, Savitz L, Masica AL, Shah N, Fleming NS. Challenges in Using Electronic Health Record Data for CER: Experience of 4 Learning Organizations and Solutions Applied. Med Care. 2013 doi: 10.1097/MLR.0b013e31829b1d48. [DOI] [PubMed] [Google Scholar]
  • 121.Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20(1):144–51. doi: 10.1136/amiajnl-2011-000681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Vechtomova O, Zhang H. Articulating complex information needs using query templates. Journal of Information Science. 2009;35(4):439–452. [Google Scholar]
  • 123.Hurdle JF, Haroldsen SC, Hammer A, Spigle C, Fraser AM, Mineau GP, Courdy SJ. Identifying clinical/translational research cohorts: ascertainment via querying an integrated multi-source database. Journal of the American Medical Informatics Association. 2013;20(1):164–171. doi: 10.1136/amiajnl-2012-001050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Anderson N, Abend A, Mandel A, Geraghty E, Gabriel D, Wynden R, Kamerick M, Anderson K, Rainwater J, Tarczy-Hornoch P. Implementation of a deidentified federated data network for population-based cohort discovery. J Am Med Inform Assoc. 2012;19(e1):e60–e67. doi: 10.1136/amiajnl-2011-000133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Weber GM, Murphy SN, McMurry AJ, Macfadden D, Nigrin DJ, Churchill S, Kohane IS. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009;16(5):624–30. doi: 10.1197/jamia.M3191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.McMurry AJ, Murphy SN, Macfadden D, Weber G, Simons WW, Orechia J, Bickel J, Wattanasin N, Gilbert C, Trevvett P, Churchill S, Kohane IS. SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies. PLoS One. 2013;8(3):e55811. doi: 10.1371/journal.pone.0055811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Zhang GQ, Siegler T, Saxman P, Sandberg N, Mueller R, Johnson N, Hunscher D, Arabandi S. VISAGE: A Query Interface for Clinical Research. AMIA Summits Transl Sci Proc. 2010;2010:76–80. [PMC free article] [PubMed] [Google Scholar]
  • 128.Horvath MM, Winfield S, Evans S, Slopek S, Shang H, Ferranti J. The DEDUCE Guided Query tool: providing simplified access to clinical data for research and quality improvement. Journal of biomedical informatics. 2011;44(2):266–276. doi: 10.1016/j.jbi.2010.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Dörk M, Williamson C, Carpendale S. Navigating tomorrow's web: From searching and browsing to visual exploration. ACM Transactions on the Web (TWEB) 2012;6(3):13. [Google Scholar]
  • 130.Jin J, Szekely P. QueryMarvel: A visual query language for temporal patterns using comic strips. Visual Languages and Human-Centric Computing, 2009. VL/HCC 2009. IEEE Symposium on; IEEE; 2009. pp. 207–214. [Google Scholar]
  • 131.Wongsuphasawat K, Plaisant C, Taieb-Maimon M, Shneiderman B. Querying event sequences by exact match or similarity search: Design and empirical evaluation. Interacting with computers. 2012;24(2):55–68. doi: 10.1016/j.intcom.2012.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Monroe M, Lan R, Lee H, Plaisant C, Shneiderman B. Temporal Event Sequence Simplification. Visualization and Computer Graphics, IEEE Transactions on. 2013;19(12):2227–2236. doi: 10.1109/TVCG.2013.200. [DOI] [PubMed] [Google Scholar]
  • 133.Lan R, Lee H, Fong A, Monroe M, Plaisant C, Shneiderman B. Technical Report HCIL-2013-TBD. HCIL, University of Maryland; College Park, Maryland: 2013. Temporal search and replace: An interactive tool for the analysis of temporal event sequences. [Google Scholar]
  • 134.Olvera-Lobo MD, Gutiérrez-Artacho J. Question-answering systems as efficient sources of terminological information: an evaluation. Health Information & Libraries Journal. 2010;27(4):268–276. doi: 10.1111/j.1471-1842.2010.00896.x. [DOI] [PubMed] [Google Scholar]
  • 135.Hruby GW, BM, Cimino JJ, Gao J, Wilcox AB, Hirschberg J, Weng C. Characterization of the Biomedical Query Mediation Process. AMIA Summits on Translational Science Proceedings; 2013; San Francisco. p. 5. [PMC free article] [PubMed] [Google Scholar]
  • 136.Conway M, Berg RL, Carrell D, Denny JC, Kho AN, Kullo IJ, Linneman JG, Pacheco JA, Peissig P, Rasmussen L. AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2011. Analyzing the heterogeneity and complexity of Electronic Health Record oriented phenotyping algorithms; p. 274. [PMC free article] [PubMed] [Google Scholar]
  • 137.Moskovitch R, Shahar Y. AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2009. Medical temporal-knowledge discovery via temporal abstraction; p. 452. [PMC free article] [PubMed] [Google Scholar]
  • 138.Combi C, Pozzi G, Rossato R. Querying temporal clinical databases on granular trends. Journal of biomedical informatics. 2012;45(2):273–291. doi: 10.1016/j.jbi.2011.11.005. [DOI] [PubMed] [Google Scholar]
  • 139.Program, S.L.D.S.G; I.o.E. Sciences, editor. Centralized vs. Federated: State Approaches to P-20W Data Systems. 2012. p. 6. [Google Scholar]
  • 140.Wilcox A, Randhawa G, Embi P, Cao H, Kuperman GJ. Sustainability considerations for health research and analytic data infrastructures. EGEMS (Wash DC) 2014;2(2):1113. doi: 10.13063/2327-9214.1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014;21(4):578–82. doi: 10.1136/amiajnl-2014-002747. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES