Skip to main content
Journal of Digital Imaging logoLink to Journal of Digital Imaging
. 2015 Apr 2;28(3):249–255. doi: 10.1007/s10278-015-9794-4

Strategies for Medical Data Extraction and Presentation Part 2: Creating a Customizable Context and User-Specific Patient Reference Database

Bruce Reiner 1,
PMCID: PMC4441684  PMID: 25833767

Abstract

One of the greatest challenges facing healthcare professionals is the ability to directly and efficiently access relevant data from the patient’s healthcare record at the point of care; specific to both the context of the task being performed and the specific needs and preferences of the individual end-user. In radiology practice, the relative inefficiency of imaging data organization and manual workflow requirements serves as an impediment to historical imaging data review. At the same time, clinical data retrieval is even more problematic due to the quality and quantity of data recorded at the time of order entry, along with the relative lack of information system integration. One approach to address these data deficiencies is to create a multi-disciplinary patient referenceable database which consists of high-priority, actionable data within the cumulative patient healthcare record; in which predefined criteria are used to categorize and classify imaging and clinical data in accordance with anatomy, technology, pathology, and time. The population of this referenceable database can be performed through a combination of manual and automated methods, with an additional step of data verification introduced for data quality control. Once created, these referenceable databases can be filtered at the point of care to provide context and user-specific data specific to the task being performed and individual end-user requirements.

Keywords: Data mining, Decision support, Imaging informatics

Introduction

Retrieval and extraction of medical data is a continuous challenge for healthcare professionals, largely due to the lack of data and technology integration, which forces manual and time intensive workflow [1]. As productivity and workflow demands continue to increase, this relative lack of data accessibility and integration often results in valuable data often going unseen (i.e., the proverbial tree in the woods). The data exists, but its inaccessibility renders it inconsequential. This has the potential to result in data redundancy through healthcare providers duplicating medical tests and studies which may not have been necessary had the full complement of medical data been readily available at the point of care [2]. In addition, data inaccessibility can adversely affect healthcare outcomes when medical providers render diagnostic and treatment decisions in the absence of complete and definitive data. In addition to the potential for medical errors, inaccessible and incomplete medical data can lead to a number of insidious adverse outcomes including time delays, diminished diagnostic confidence, excessive and/or unnecessary consultations, and performance of interventional procedures which may have been obviated had the full complement of data been readily available [3].

The current state of inefficient data delivery at the point of care is magnified by the fact that the volume and complexity of medical data is exponentially increasing. While computerized data mining and decision support applications offer a number of theoretical benefits related to enhanced data extraction and comprehension, they cannot fully operate without the full complement of accessible data. If key data elements are unavailable and not included in the computerized data analysis, faulty or incomplete analyses will result. Simply stated, computerized data mining is only as good as the quality and completeness of the input data. This axiom also holds true for “human” medical data mining, where a healthcare professional’s clinical experience, education, and knowledge cannot overcome deficient and/or inaccurate data (i.e., garbage in, garbage out). While a number of efforts are currently underway to create comprehensive and fully integrated patient electronic medical records and data repositories, the end result in everyday practice remains incomplete due to a number of factors including (but not limited to) the proprietary nature of information system technology, limited data transfer between multiple service providers, relative inflexibility of technology to individual end-users’ needs, and lack of data integration across disparate medical information systems. The proposed innovation strategy seeks to ameliorate these deficiencies through the creation of a patient-specific referenceable database which can record and track medical data throughout the healthcare continuum and provider network, be customized to the individual needs and preferences of healthcare providers, and adapted to the specific context being performed.

Database Principles

Before going into details regarding individual data components, few points should be made regarding the principles and priorities of the proposed database and its derivatives. Table 1 lists five priorities of the database and its data components. On the most fundamental level, the data must be of the highest quality. This means the data is not only accurate and reproducible but is also unambiguous and easily understood (which ironically is one of the principle complaints regarding conventional free text radiology reports). Whenever possible, the data should exist in a standardized format, which not only promotes understanding and clarity but also provides for meta-analysis, which lies at the core of evidence-based medicine. Secondly, the database and its component data must be accountable, in terms of usage, security, integrity, and sources. One should be able to electronically audit the data and its usage to ensure that the data sources are reliable, its end-users are properly authenticated and authorized, and the data itself has not been improperly modified.

Table 1.

Five basic database principles

1. Quality
2. Accountability
3. Economy
4. Adaptability
5. Accessibility

The principle of “data economy” is one of the most important factors in determining utility and adoption among the diverse community of end-users. In the absence of data economy, workflow and productivity would be adversely affected and ultimately lead to dissatisfaction and non-adoption. Even if the data provides greater understanding and enhanced clinical outcomes, it will not be deemed worthwhile by the majority of end-users if it is not at a minimum workflow neutral or preferably workflow enhancing [4]. In order to promote data economy, the database must be able to parse and deliver condensed data at the point of care in a context and user-specific fashion. This means filtering out extraneous data which is not of high importance to the task being performed and the specific needs and preferences of the individual user. This ability to economize data could rely on a number of tools including (but not limited to) eye tracking (to account for how presented data is actually being viewed), electronic auditing tools (to see how the user is manually interacting with the data in the task performance), artificial intelligence techniques to use predictive analytics to mathematically determine probabilities as to what data is required based upon past performance, and predefined rules defining context-specific data requirements. The goal is to present the end-user with the minimum amount of data which will provide complete, thorough, and accurate task performance.

Adaptability of the proposed data strategy is another component critical to widespread acceptance. The current technology development model of “one size fits all” is counter-intuitive, for it forces end-users of diverse skill sets, knowledge, and computer proclivity to be constrained by a relatively static and inflexible technology [5]. If instead, the functionality of the database and its derived analyses could be adapted to specific task requirements and end-user preferences; it would be both context and user-specific. If eye tracking and electronic auditing tools were directly integrated into the application, one could in theory have the ability to monitor individual usage patterns and compare these with users of similar profiles to generate recommendations for improved workflow and data analysis (i.e., iterative technology refinement).

Lastly, the data must be accessible and not be artificially constrained by physical location, technology, or economics. An authorized end-user desiring access to the database should be able to do so remotely (e.g., cloud computing) and through the use of portable devices, not tied to a single geographic location. In addition to portability, the database should provide for active collaboration between multiple end-users in order to promote idea sharing, education, research, and creation of best practice guidelines. As an example, context-specific data extraction templates could be imported from one end-user to another, with the goal of continuously updating and refining database deliverables.

Database Components

A number of individual components are contained within the collective patient database; reflecting the diverse nature of a patient’s healthcare record. While each individual medical discipline and its associated data would be represented in the generalized strategy, the application to medical imaging will divide this comprehensive medical data into two broad categories, clinical and imaging data. Tables 2 and 3 provide representative lists of the data elements within these two broad categories.

Table 2.

Imaging data categories

1. Imaging and report data
2. Technical data
3. Safety data
4. Administrative data
5. Quality data
6. Procedural data
7. Technologist data

Table 3.

Clinical data categories

1. Medical history and disease/problem list
2. Pharmacology
3. Surgical and procedural data
4. Laboratory and pathology data
5. Proteomic and genomic data
6. Physical exam
7. Family history
8. Social, occupational, and environmental data
9. Current presentation and complaints
10. Clinical test data

From a radiologist perspective, the most important imaging data exists in the forms of text-based report and pixelated images. Conventional imaging databases commonly operate by listing historical imaging exams chronologically and in accordance with anatomic region and imaging modality. Computerized hanging protocols will frequently identify the most recent comparable imaging exams (as defined by anatomic region and modality), and automatically retrieve and display the corresponding imaging and report data in full format. All other historical imaging data not retrieved by the automated hanging protocols is subject to manual retrieval on the part of the radiologist. The combined factors of full format data and manual workflow commonly result in high degrees of inter- and intra-radiologist variability; relating to the quantity, content, and method in which historical imaging data is reviewed (e.g., complete report data review vs. targeted review of the report impression only). By convention, imaging and report data are distinct and separate from one another (i.e., de-coupled), which commonly results in incomplete data review (e.g., review of report data in the absence of correlating images).

An alternative (and preferred) strategy of imaging data archival and retrieval would change the existing model of de-coupled imaging and report data to one where imaging and report data are integrated with one another on a finding-specific basis. In this model, each individual finding in a radiology report would have one to two assigned images from the collective imaging dataset, which would provide combined text and imaging data in a single command (i.e., finding-specific annotated key images). This would provide a number of theoretical benefits including integrated data, ability to index and search imaging data specific to individual findings (independent of imaging modality or exam type), and elimination of the inefficient and time-consuming process of “all inclusive” report and imaging exam review.

An additional benefit of this finding-specific approach to the imaging database is the ability to record and review standardized support data for each individual finding. Examples of standardized finding-specific support data are presented in Table 4 and in some respects parallel BI-RADS, but on a more granular and finding-specific basis. This methodology could provide greater consistency and clarity in reporting, along with improved data mining capabilities. The current model of the historical imaging database defined by chronologic text lists of imaging exams could in theory be transformed into a model where historical imaging data can be displayed and queried, in accordance with individual findings and disease states, which in turn can be indexed in accordance with anatomy, modality, date, and support data (e.g., follow-up recommendations).

Table 4.

Imaging finding-specific support data

1. Anatomic location
2. Temporal change
3. Clinical significance
4. Follow-up recommendation
5. Differential diagnosis

This transformative approach of imaging data from collective reports to individual findings or disease states can also be applied to clinical data. In current practice, clinical reports also follow a model of combining multiple data elements into a single all inclusive report (e.g., history and physical exam, consultation report, discharge summary). If one was to extract data on a context-specific basis (e.g., disease state, organ system), you could in theory create a context-specific clinical database utilizing the same standardized supporting data format. Clinical test data (e.g., laboratory, test, pathology data) could be recorded in accordance with anatomy, clinical significance, temporal change, follow-up recommendation, and differential diagnosis. The resulting database would provide a method for searching, extracting, and analyzing data based upon the data category, technology, anatomy/organ system, pathology, clinical context, and supporting data. This provides a pathway for economizing and customizing the data in accordance with the specific task being performed and individual end-user preferences.

Data Sources and Database Creation

The computerized sources for imaging data primarily center on imaging modalities and information system technologies (e.g., picture archiving and communication system (PACS), computerized physician order entry (CPOE), radiology information system (RIS)). Manual data input is also important and has historically served as an important data source in imaging practice. The principle data source for manual data entry is the technologist, who is arguably the “eyes and ears” of the radiologist. Historically, the technologist has served as the radiologist’s principle source of clinical history [6], relying upon a number of data sources including the patient, referring clinician, nursing staff, and historical radiology reports. The resulting clinical data would routinely be summarized in a technologist datasheet which was submitted along with the order information and hard-copy images in the “old days” of analog radiology practice. With the advent of PACS and digital imaging, the role of the technologist in clinical data collection and recording has declined. Reinstitution of the technologist as a primary clinical data source can provide important (and currently underutilized) benefits, which can be facilitated through the creation of electronic clinical datasheets integrated into the PACS and RIS, and continuously updated and modified over the course of the patient’s radiology experience.

Another important human source of clinical data is the ordering clinician, who in theory is responsible for providing imaging staff with relevant, accurate, and comprehensive clinical data at the time of order entry. The quality and quantity of this clinical data is important to a number of steps in the imaging chain including exam selection, protocol optimization, interpretation, and reporting. Unfortunately, order entry clinical data input is routinely deficient [7], even with the adoption of CPOE systems which can be effectively “gamed” by inputting erroneous clinical data designed simply to satisfy order entry requirements [8, 9]. One proposed solution to improve clinical data quality and consistency during order entry is the creation of data accountability metrics [10] which provide a standardized scoring system for analyzing clinical order entry data specific to the individual patient, clinical context, and exam being performed. This concept of creating data accountability standards and analytics is designed to enhance data quality on the parts of both the referring clinician and interpreting radiologist. The importance of incorporating these technologist and referring clinician clinical data input strategies into the imaging chain on the “front end” is emphasized by the fact that radiologist-clinician consultations have been reported to decrease with the advent of PACS, increased workload demands, and expansion of teleradiology services [1113]. In the absence of these consultations, radiologists must rely upon alternative data sources to ensure accurate and unequivocal interpretations.

While technologists and referring clinicians can serve as important sources for clinical data, a number of computer-based data sources must also be utilized for creation of a comprehensive database. Table 5 lists a number of these electronic clinical data sources. Medical data can be broadly classified into three categories according to the degree of change over time. These can be classified as static, dynamic, and episodic. Static data remains relatively constant and once entered into the database requires minimal updating (e.g., genetic data, family history). The next level is episodic data, which tends to be fairly consistent over time, with occasional episodic change (e.g., medical problem list, surgical history, pharmacology). Since these episodic changes are often associated with major medical events such as hospitalization or change in medical providers, they are often predictable and can serve as internal prompts for database modification. If, for example, a patient undergoes a hospitalization (which will result in new episodic data), a number of high-yield data sources (e.g., discharge summary, billing records) can be used for either computerized or manual updating of the clinical database. The next level of data is dynamic data, which consists of medical data in constant flux, with new and continuously changing data. This type of data is frequently the result of new and/or changing medical problems, which can result in additional diagnostic test data (e.g., laboratory, imaging), referrals and consultations, or therapeutic interventions (e.g., pharmacologic, surgical). Data sources for dynamic data in the electronic patient record may include physician order entry data, consultation reports, progress notes, and laboratory/imaging reports. Because this data is in a constant flux, it becomes the most difficult to update and ensure accuracy. As computerized data mining algorithms and artificial intelligence techniques continue to advance, one would expect retrieval and recording of dynamic data mining to improve. In the meantime, manual data entry may be required from healthcare providers who routinely review and analyze the data in the performance of their everyday jobs (e.g., nurses, technologists, primary care physicians).

Table 5.

Clinical data sources

1. History and physical (H & P)
2. Hospital discharge summaries
3. Consultation reports
4. Clinical test results
5. Laboratory data
6. Pathology reports
7. Procedural/surgical notes
8. Physician and pharmacy orders
9. Disease problem lists
10. Progress and physician notes
11. Billing systems

As medical data is continuously being acquired, it seems like an impossible task to prioritize data in a consistent and predictable fashion, while also ensuring database quality and economy. One strategy is to utilize a standardized context-specific data grading system as previously described for individual imaging report findings (Table 4). A data element which is identified as exhibiting high clinical significance, worsening temporal change, associated with specific follow-up recommendations, or associated with high-priority diseases or medical conditions could automatically trigger inclusion to the database. While this would require compliance on the part of the reporting healthcare provider when entering new medical data, if widely adopted in conventional practice, it would provide a fairly consistent method for identifying high-priority data.

Since end-users are unlikely to consistently identify high-priority or actionable data at the time of data entry, an alternative strategy is to have computerized data mining techniques (e.g., natural language processing (NLP)) review data for identification of specific actionable criteria (Table 6). The resulting computer-derived data can in turn be subjected to human verification, prior to final integration into the patient referenceable database. This verification process can take place at the time data is recorded (e.g., via an electronic prompt initiated by the NLP program) or at a later predefined date (e.g., weekly data audits), in which the accumulated data is stored in a preliminary status pending final verification. The verification process could be performed by authorized healthcare professionals (e.g., primary care physician or nurse for clinical data, radiologist or technologist for imaging data). The advantage of this hybrid data mining approach is that it provides valuable feedback to the computer applications for iterative refinement and computer learning. The ultimate goal is to create a system in which high-priority or actionable data is consistently recorded and validated, without imposing negative workflow demands. If, the resulting referenceable database and derived analyses prove to be beneficial to end-users in everyday practice, they will likely be more accommodating to these data verification requirements.

Table 6.

Actionable (high-priority) data triggers

1. Clinical significance
2. Follow-up recommendations
3. QA events
4. Temporal change (new and/or worsening)
5. Medical/surgical intervention
6. Critical results communication
7. Hospitalization
8. New diagnosis

Customization Features

While the majority of data customization lies in the processes of data extraction and presentation (which will be discussed in a companion article), there is some opportunity for database customization as well, in accordance with individual institutional and stakeholder data requirements. From the institutional perspective, each medical organization has their own unique healthcare priorities, patient population served, human resources, professional services provided, and technology infrastructure. Based upon these combined factors, database requirements may differ between institutions and as a result need to be accounted for in the methods in which data is recorded and used to create patient-specific referenceable databases. As an example, a rehabilitation healthcare facility whose services are largely composed of long-term care for patients with chronic morbidities will likely have different database requirements than an acute care hospital which focuses on short-term acute care healthcare services. The rehabilitation facility database will preferentially focus on longitudinal data for chronic medical problems (i.e., typically over a multi-year time course) with an increased focus on multi-disciplinary data from a variety of ancillary medical sources (e.g., dietary, physical therapy, speech pathology). The acute care facility, on the other hand, will focus far greater attention on acute, short-term medical problems, with the preponderance of data provided by physician data sources.

The ability to provide customization features to the patient referenceable database also extends to individual healthcare providers. In the prior example of the rehabilitation facility where ancillary medical data becomes a more integral part of the database, ancillary medical service providers such as a nutritionist or pharmacologist arguably play a greater role in long-term patient management than their counterparts might play in an acute care setting. As a result, the data requirements for these healthcare professionals may differ in accordance with medical data duration, complexity, granularity, and the number of interaction effects. While the core components of the referenceable database remain relatively constant, the ability to add/modify specific data requirements in accordance with institutional and individual provider profiles serves to address the dynamic nature of medical data and the diverse patient populations served.

Quality Assurance of the Database

The overall value and utility of the database will be directly determined by the quality and integrity of the data within it. In order to ensure that data consistently maintains these standards, a rigorous quality assurance (QA) program is required, which can incorporate both manual and computerized QA analyses. At the core of any QA program is data accountability, which ensures that the data source is reliable, accurate, and authorized. One important QA component is the ability to record and track all data entries, including the identity of the data source, date/time of data entry, and data action taken. These data actions can take a number of forms including new data entry, data verification, data modification, and data deletion. Since the database is designed to be dynamic in nature, the components would be expected to be constantly changing and, as a result, require monitoring and analysis to ensure standards related to data integrity and accuracy are consistently maintained. In the event that an unauthorized attempt at data access or action was to take place, the database could be temporarily placed in “lockdown” status until a formal administrative review is performed.

An additional QA function is the periodic review and analysis of all parties utilizing the database. This provides insight as to how the database is being used and the relative frequency in which specific data elements are being reviewed. An important feature provided by the database is the ability to incorporate individual data elements from the referenceable database directly into medical documentation. As an example, a radiologist interpreting a new imaging exam may wish to reference prior imaging report or clinical findings which directly relate to current imaging exam findings. By utilizing “cut and paste” or hypertext functionality, specific imaging or clinical data elements from the referenceable database can be incorporated into the current report, providing “added clinical context.”

The ability to add, delete, modify, or validate data is determined by the privileges assigned to each individual end-user. Since each data transaction would be automatically recorded in the database, an electronic audit trail would be created which allows authorized administrative staff to ensure end-users are compliant and using the database in an appropriate manner. This auditing tool function also provides the ability to create a user-specific profile, which provides insight as to how each individual end-user is interacting with the database and the specific data elements which are being routinely accessed and incorporated into workflow. The corollary would be to also define what data is not being utilized, which can provide an effective method of filtering out unnecessary data and redesigning data presentation. In the event that an adverse outcome (e.g., diagnostic or treatment error) was to occur, the database could effectively be reviewed to determine what relevant data was available, accessed, and used at the time the task was performed. The purpose of this function is not designed to be punitive, but instead educational. Providing context-specific analysis will in theory help define best practice guidelines, facilitate education and training, and provide individual end-users with specific feedback aimed at improving task performance and efficiency.

Conclusion

The creation of a patient referenceable database is designed to address the existing challenges of medical data overload, limited data integration, and inefficient workflow. These combinations of factors often result in clinically valuable data being ignored or overlooked, which can in turn result in medical error, time delays, and/or unnecessary expense. The goal of this referenceable database is to create an efficient means of data economy, where an authorized end-user would have the ability to access context-specific summary data using a single multi-disciplinary data source based upon predefined actionable (i.e., high-priority) data. By incorporating electronic tools to monitor how this referenceable data is actually being utilized, iterative refinement of the database can be performed, along with user-specific feedback related to best practices. The goal is to empower healthcare professionals with context and user-specific data at the point of care, where improved data delivery and analysis can have the greatest impact on clinical outcomes.

References

  • 1.Fayyad U, Piatetsky-Shapiro G, Smyth P. The KDD process of extracting useful knowledge from volumes of data. Commun ACM. 1996;11:27–34. doi: 10.1145/240455.240464. [DOI] [Google Scholar]
  • 2.Cios KJ, Moore GW. Uniqueness of medical data mining. Artif Intell Med. 2002;26:1–24. doi: 10.1016/S0933-3657(02)00049-0. [DOI] [PubMed] [Google Scholar]
  • 3.Teich JM, Merchia PR, Schmiz JL, et al. Effects of computerized physician order entry on prescribing practices. Arch Intern Med. 2000;160:2741–2747. doi: 10.1001/archinte.160.18.2741. [DOI] [PubMed] [Google Scholar]
  • 4.Reiner BI, McKinley M. Innovation economics and medical imaging. J Digit Imaging. 2012;3:325–329. doi: 10.1007/s10278-012-9470-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Reiner B. One size (doesn’t) fit all. J Am Coll Radiol. 2008;4:567–570. doi: 10.1016/j.jacr.2007.09.006. [DOI] [PubMed] [Google Scholar]
  • 6.Reiner B: Optimizing medical data extraction and presentation: current limitations and deficiencies. J Digit Imaging 2:123–126, 2015 [DOI] [PMC free article] [PubMed]
  • 7.Berger RG, Kichak JP. Computerized physician order entry: helpful or harmful? J Am Med Inform Assoc. 2004;11:100–103. doi: 10.1197/jamia.M1411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rosenthal DI, Weilburg JB, Schultz T, et al. Radiology order entry with decision support: initial clinical experience. J Am Coll Radiol. 2006;3:799–806. doi: 10.1016/j.jacr.2006.05.006. [DOI] [PubMed] [Google Scholar]
  • 9.Kilbridge PM, Welebob EM, Classen DC. Development of the Leapfrog methodology for evaluating hospital implemented inpatient computerized physician order entry systems. Qual Saf Health Care. 2006;15:81–84. doi: 10.1136/qshc.2005.014969. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Reiner BI. Medical imaging data reconciliation. Part 2: clinical order entry/imaging report data reconciliation. J Am Coll Radiol. 2011;10:720–724. doi: 10.1016/j.jacr.2011.05.004. [DOI] [PubMed] [Google Scholar]
  • 11.Reiner B, Siegel E, Protopapas Z, et al. Impact of filmless radiology on the frequency of clinician consultations with radiologists. AJR. 1999;173:1169–1172. doi: 10.2214/ajr.173.5.10541082. [DOI] [PubMed] [Google Scholar]
  • 12.Siegel EL, Reiner BI. Filmless radiology at the Baltimore VA Medical Center: a nine-year retrospective. Comput Med Imaging Graph. 2003;27:101–109. doi: 10.1016/S0895-6111(02)00083-6. [DOI] [PubMed] [Google Scholar]
  • 13.Levin DC, Rao VM. Outsourcing to teleradiology companies: bad for radiology, bad for radiologists. J Am Coll Radiol. 2011;8:104–108. doi: 10.1016/j.jacr.2010.08.017. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Digital Imaging are provided here courtesy of Springer

RESOURCES