Skip to main content
Radiographics logoLink to Radiographics
. 2009 Jan 23;29(2):331–343. doi: 10.1148/rg.292085098

Problem-centric Organization and Visualization of Patient Imaging and Clinical Data

Vijayaraghavan Bashyam 1, William Hsu 1, Emily Watt 1, Alex A T Bui 1, Hooshang Kangarloo 1, Ricky K Taira 1
PMCID: PMC2678554  PMID: 19168763

Abstract

A patient’s electronic medical record contains a large amount of unstructured textual information. As patient records become increasingly dense owing to an aging population and increased occurrence of chronic diseases, a tool is needed to help organize and navigate patient data in a way that facilitates a clinician’s ability to understand this information and that improves efficiency. A system has been developed for physicians that summarizes clinical information from a patient record. This system provides a gestalt view of the patient’s record by organizing information about each disease along four dimensions (axes): time (eg, disease progression over time), space (eg, tumor in left frontal lobe), existence (eg, certainty of existence of a finding), and causality (eg, response to treatment). A display is generated from information provided by radiology reports and discharge summaries. Natural language processing is used to identify clinical abnormalities (problems, symptoms, findings) from these reports as well as associated properties and relationships. This information is presented in an integrated format that organizes extracted findings into a problem list, depicts the information on a timeline grid, and provides direct access to relevant reports and images. The goal of this system is to improve the structure of clinical information and its presentation to the physician, thereby simplifying the information retrieval and knowledge discovery necessary to bridge the gap between acquiring raw data and making an informed diagnosis.

© RSNA, 2009

Introduction

Patient records contain a large quantity of complex, heterogeneous data such as imaging studies, textual reports, laboratory results, and so on. A comprehensive review of a patient’s medical record usually requires the physician to examine multiple documents while mentally noting issues related to the current clinical context and filtering out unrelated information (1). For example, a neuro-oncologist following up a patient with a recurring tumor typically spends time examining multiple documents (eg, consultations, discharge summaries) to understand the patient’s clinical history, evaluate the patient’s response to the current treatment regimen (eg, reduction in tumor size), and determine whether any adjustments are needed to optimize treatment (eg, increase in dosage). Given the time constraints for each patient encounter, the data complexity, and the data volume associated with chronic conditions, physicians often have difficulty gathering all of the relevant patient information.

The inability to completely review a patient’s record compromises patient care in several ways, including lack of coordination among care-givers (2), poor integration of examination results for diagnosis (3), initiation of treatment before a definitive diagnosis is established, administration of inappropriate therapies, and performance of information-redundant studies.

The increasing adoption of electronic medical records has made information more accessible to clinicians through dedicated systems such as hospital information systems, picture archiving and communication systems, and radiology information systems. However, increased access to information does not necessarily translate into use of that information (1). Merry (4) observed that information retrieval tasks that take more than 30 seconds are unlikely to be generally adopted by practicing clinicians. There is a strong need (5,6) for improvement in the organization and presentation of clinical information to facilitate the clinician’s ability to answer questions regarding a patient’s case and to reduce information overload (5). One such approach, the problem-oriented medical record (POMR) (7), attempts to address the shortcomings of traditional clinical history narratives with a reporting structure that organizes patient data around active medical problems. Various groups have attempted to build computerized versions of the POMR, such as Salmon et al (8) and Bui et al (9). Although the POMR accurately models how physicians cognitively represent medical information, the effort associated with organizing newly acquired patient information to suit this paradigm is time consuming. In addition, Bossen (10) concludes that clinicians examining patients with a complex medical history are often unable to understand the overall clinical picture because computerized POMRs fragment the patient record into multiple disjointed screens on the basis of identified problems.

For computerized POMRs to be better utilized in the clinical environment, these limitations need to be addressed by (a) providing an automated method for organizing patient data around medical problems identified from clinical documents, and (b) displaying this information in an integrated format that allows users to seamlessly view information at different levels of detail and simultaneously across problems.

We have developed a system that facilitates the review of clinical patient data by promoting an orderly process of medical problem understanding and patient care. The system makes use of natural language processing (NLP) to automatically identify and extract clinical problems, their associated findings, and attributes that are classifiable along four dimensions (axes): time, space, existence, and causality. Each dimension elucidates an aspect of the medical problem, with the goal of assisting clinicians in optimizing treatment. The extracted information is presented in a temporal and hierarchic format that allows the user to dynamically filter patient data and to view relevant images from imaging studies with use of an integrated viewer. The viewer enables the user to look at the patient record at various scales characterized by different dimensions, thereby facilitating information retrieval, processing, and knowledge building and helping the clinician make an appropriate diagnosis. By following Shneiderman’s paradigm of “overview, zoom & filter, and details-on-demand” (11), users can obtain an overview of the patient’s status and delve into more detail about a particular problem of interest.

In this article, we review relevant work in NLP, data organization, and visualization of patient records; describe the system framework and prototype implementation in the domain of neuro-oncology; and discuss limitations of the current implementation and future applications for this project.

Background

Over the past 2 decades, significant progress has been made in using NLP to extract specific entities from free text in clinical reports. Several studies have reported the use of the MEDLEE NLP system to extract clinically relevant findings from mammography reports (12) and pathology reports (13), pneumonia-related information about infants from radiology reports (14), and temporal information from clinical reports (15). Other investigators have used lexicons for identifying problems (16) as well as extracted noun phrases (17), anatomic descriptors (18), patient name references (19), and negated concepts (20,21) from clinical reports. In particular, the open source tool known as MetaMap, developed by the National Library of Medicine (22), has been used for several information extraction purposes in the domain of biomedical free text. For an overview of recent applications of MetaMap, see Bashyam et al (23).

The extracted information needs to be organized in a way that facilitates efficient presentation to, and retrieval by, the clinician. Data in patient records have traditionally been organized in three ways (24): (a) source-oriented views, in which data are organized on the basis of origin (eg, laboratory results are grouped together, whereas medications are grouped separately); (b) time-oriented views, in which data are organized on the basis of when they are produced; and (c) concept-oriented views, in which data are organized on the basis of relevance to a particular topic (eg, medical problems, current therapies). Although each view is well suited for a specific user type (eg, clinicians, researchers, patients) or task (eg, follow-up, consultation), no one view is sufficient to support the needs of all users in the medical community. Past approaches to addressing this issue include that of Portoni et al (25), who describe a system that generates different views dynamically, providing users with tailored perspectives on the patient data based on their needs; and that of De Clercq (26), who demonstrates how a conceptual problem-oriented architecture is tailored to support the needs of both nursing and other medical professionals.

The extracted and organized clinical information can be fully utilized only by presenting it in a way that helps the user understand the trends and relationships in the data. The goals of clinical data visualization are to (a) visually present medical data in a more intuitive, easy-to-understand format; (b) visually magnify subtle aspects of the patient record that are pertinent to tailoring diagnosis and treatment; and (c) prevent information overload by presenting only the information needed at a given time (27). Research in the area of information visualization (28) has resulted in novel depictions that assist users with such tasks as interacting with large amounts of data, discerning trends, elucidating relationships, and understanding changes over time. In the medical domain, techniques have included graphical displays for planning therapies (29), facilitating decision support (30), and visualizing temporal trends (31,32). One challenge of providing a visual summary of a patient’s record is integrating all of the disparate representations into a cohesive interface. One approach has been to use representations of the human body to organize patient data. For example, the Visible Human Dataset (33) provides a set of high-resolution images that can be overlaid with the locations of a patient’s problems and findings to show their spatial distributions and relationships with one another. The Anatomic and Symbolic Mapper Engine from IBM (Zurich, Switzerland) (34) proposes to extend this approach by creating an interactive three-dimensional model of the human body as a way to navigate and search a patient’s medical record for pertinent information related to the selected anatomic location.

Our effort differs from previous work in several ways. Although our approach to generating a problem list is somewhat similar to that of Meystre and Haug (16), our system is not limited to a specific set of problems associated with a particular disease. Meystre’s and Haug’s system identifies problems from a predefined set of 80 problems, whereas our system identifies both lexicon-based concepts (eg, UMLS [Unified Medical Language System] or RadLex concepts) and complicated expressions denoting abnormalities (eg, “thick rim enhancement”) based on deeper linguistic analysis (ie, beyond just dictionary lookup). In addition, Meystre’s and Haug’s system identifies only problems and seeks to present a problem list, whereas our system collects several other pieces of relevant information and organizes them around a problem list. Thus, the task of problem list generation is intended not only for organizing the clinical data but also for summarizing various aspects of the patient record for the end user. Although our system is similar to the work of Bui et al (35), it differs in terms of how the problem list is generated: Instead of creating a problem list from International Classification of Diseases codes (ICD-9), our system generates the problem list by parsing free-text medical reports. The novelty of our visualization approach is the integration of multiple dimensions (eg, temporal, spatial) of information into a single, uniform interface. Although our efforts in this respect are similar to those of IBM’s Anatomic and Symbolic Mapper Engine in that we link concepts extracted from text documents to graphical representations, our focus has been to summarize key pieces of information from the patient record and organize them along four dimensions and not solely on the basis of anatomy. In addition, we intend to make this application freely available to the medical community as an open source project.

System Framework

Figure 1 illustrates the overall framework of our system.

The system consists of three high-level modules that perform tasks in all three areas of summarizing clinical data: information extraction, information organization, and information presentation and visualization.

The modules are collectively responsible for identifying, structuring, and presenting the underlying patient data. The first module (information extraction) parses a patient’s discharge summaries and radiology reports with use of NLP and generates a list of medical problems and findings along with their associated attributes. This extracted information is then passed on to the second module (information organization), which characterizes the extracted information along four dimensions. These categorizations are used by the third module (information visualization) to organize and render the underlying data in different ways (eg, sorted anatomically or temporally) depending on the user’s preference. We discuss this process in greater detail in the following sections.

Figure 1.

Figure 1

Figure 1.  Schematic illustrates the components of our system and how patient data are processed and organized.

Information Extraction

The architecture of our NLP system has been described in detail in several previous works (3638). The system consists of several modules that perform various tasks as described in the following paragraphs.

Structural Analyzer.—

The task of the structural analyzer is to identify all topical boundaries dividing a report into sections and all sentences within a section. Topical boundaries divide a report into sections such as “HEADER,” “HISTORY,” “CHIEF COMPLAINT,” “PROCEDURE,” “FINDINGS,” “CONCLUSIONS,” and so on. Boundaries are identified with (a) an algorithm that makes use of a knowledge base of commonly used heading labels and linguistic cues seen within training examples, and (b) a probabilistic classifier based on an expectation model for the given document structure. The classifier’s expectation model is generated from statistics of the order of sections in a report, the number of words seen within a particular section type, and the types of communication expressed within these sections. Sentence boundaries are commonly denoted by periods, question marks, and exclamation points; however, clinical reports do not always contain these delimiters. Other punctuation marks such as colons, semicolons, and hyphens, as well as enumerations, are also used to delimit sentences. In addition, periods may not correspond to sentence delimiters in cases such as abbreviations (“M.D.”) and decimal points. The system first identifies potential sentence boundaries by searching the text for sequences of characters separated by white space and containing one of several symbols (“.,” “!,” “?,” “:,” “;,” “),” “*”). A window of tokens in the vicinity of the potential delimiter provides contextual information. Each potential delimiter is classified with a maximum entropy classifier, which makes use of a set of 44 features determined from the contextual information, into one of the following classes: end of word, decimal point, abbreviation, part of date, end of enumeration, part of honorific phrase, or part of initials. The final output of the structural analyzer is a list of sections, with each section containing a list of sentences.

Sentence Level Processor.—

The sentence level processor performs NLP tasks at a sentence level—namely, lexical analysis, phrase analysis, and syntactic parsing.

Lexical analysis involves tokenization, part-of-speech tagging, and semantic class tagging. The tokenizer takes in a sentence as input and identifies single- and multiple-word terms in the sentence. The part-of-speech tagger assigns each token its corresponding part of speech (eg, noun, verb, adjective). The semantic class tagger assigns each token a semantic label that corresponds to its classification in our taxonomy, which includes both single- and multiple-word terms and has been handcrafted for the domain of radiology. Entries in our taxonomy were drawn from two distinct types of sources: published sources and actual radiology reports. Published sources include indexes from radiology textbooks, radiology review manuals, radiology word compilations, and published radiology glossaries. The current taxonomy includes over 450 semantic classes, providing improved discrimination for advanced NLP tasks such as syntactic parsing and semantic interpretation.

Phrase analysis involves the identification of logically coherent, nonoverlapping sequences of words in a sentence. This process includes syntactic phrase chunking, which involves the identification of noun and verb phrases; and semantic phrase chunking, which involves the identification of named entity phrases (eg, problems, findings, symptoms, anatomic phrases) and relationship phrases (eg, spatial, temporal, causal, or existential relationships). Our system uses a maximum entropy classifier to estimate the expectation model for classifying words into the following categories: phrase boundaries, inside-of-a-phrase, and outside-of-a-phrase. A Viterbi dynamic programming algorithm is used to maximize the tag sequence probability. Identification of corresponding UMLS, American College of Radiology, and RadLex codes is also performed at this stage.

The syntactic parser takes the output of the phrase analyzer and identifies syntactic dependency relations between tokens. A syntactic dependency relation is an asymmetric relation between two words conveying strong grammatic and semantic associations. The set of dependency relations that can be defined in a sentence form a “tree” known as the dependency tree. Identifying dependency relations in a sentence is an important step toward advanced language understanding. Our NLP system uses a novel chemistry-physics–inspired “field theory” parser to identify dependency relations (38).

Discourse Analyzer.—

The discourse analyzer links the description of a finding in serial studies to capture the “behavior” of the finding (eg, state changes). This task is referred to as coreference resolution, in which all descriptions of a single finding at different structural levels (including within a sentence, within a report, and across multiple reports) are linked. Coreference resolution consists of two stages. In the first stage, all instances of noun phrases that explicitly refer to findings are identified with semantic filtering; that is, all phrases that are tagged into one of a set of semantic categories (eg, “finding,” “finding.abnormality”) are identified. Tokens representing pronouns are identified with the appropriate part-of-speech label (“pron”), after which a set of finding-pronoun pairs is collected with use of a set of rules for evaluation of possible coreference. In the second stage, a probabilistic model is used to determine whether terms are related. A maximum entropy classifier is used to model the likelihood of two tokens sharing a coreference relationship. The system has several features, including the following:

  1. Linguistic cues. Words such as “a,” “the,” “another,” and “a new” indicate whether the current reference is new or was mentioned previously.

  2. Lexical class agreement, which helps decide whether two candidates are conceptually or numerically identical (eg, the findings for “solitary lesions” and “mass” are not identical due to numeric disagreement).

  3. Frame description compatibility, which helps decide whether two candidates are identical on the basis of attribute values (eg, “3-cm mass” and “2-cm mass” are not identical findings).

  4. Existence agreement, which helps decide whether existence descriptors are compatible (eg, “resolved” and “increasing in size” are incompatible existence descriptors for the same finding).

  5. Behavior agreement, which helps decide whether behavioral descriptors are compatible (eg, a finding cannot have opposing behaviors such as increasing and decreasing in size at the same time).

Further details about the coreference resolution component can be seen in our earlier published article (39).

Figures 2 and 3 show screen shots from our NLP system. Although past work on the system has focused on radiology reports (36), we have modified the system to handle discharge summaries, since they are typically a comprehensive source for enumerated problems and findings. In doing so, we have added terms to our lexicon, including various treatment protocols (eg, chemotherapy) and new types of drugs. The system has been fine-tuned to target semantic phrases that occur frequently in medical text, such as temporal descriptions (eg, dates, time, duration), named entities (eg, names and addresses of patients and institutions), medication descriptions, spatial relations (eg, “is located just posterior to”), temporal relations (eg, “in comparison to”), causal relations, numbers, percentages, and others.

Figure 2.

Figure 2

Figure 2.  Screen shot shows the NLP back-end application, which parses clinical documents and extracts problems, findings, and related attributes.

Figure 3.

Figure 3

Figure 3.  Screen shot shows how the NLP system generates a problem list and associates identified problems with an anatomic location, American College of Radiology category, and existential status.

Information Organization

Once the desired concepts have been extracted by the NLP system, the output must be structured in a way that guides how this information is presented to the user. The primary outputs are the problems and findings identified from discharge summaries; however, additional attributes are also extracted that provide greater detail about these problems and findings. We have developed an organizational scheme that categorizes these attributes into semantically related groups. Conceptually, these groups may be thought of in terms of temporal, spatial, existential, and causal dimensions.

Temporal Dimension.—

Time is an essential component of making a complete and accurate clinical diagnosis. The date on which a particular problem or finding is present conveys temporal information that provides historical context for a given medical problem. With this information, clinicians can easily determine whether a particular disease is a recurrence or a newly discovered problem. The temporal dimension is represented as a grid whose rows represent unique problems or findings and whose columns represent the dates on which these findings occur. This representation facilitates visualization of temporal trends and allows users to easily identify when a problem occurred and in which document it was mentioned.

Spatial Dimension.—

Anatomic (spatial) descriptions of findings are fundamental to disease understanding, since symptoms are often the result of changes to surrounding regions caused by the disease. Mapping anatomic descriptors extracted with NLP from clinical documents to spatial representations (eg, standardized atlases or anatomic frames of reference) provides both improved visualization of how the problems are distributed in a patient and a common frame of reference for facilitating spatial reasoning related to patient outcomes. Anatomic phrases identified in the patient record are mapped to standardized concepts found in controlled vocabularies. These concepts are then mapped to regions in the imaging data by registering patient imaging studies to normalized labeled atlases using a mutual information–based algorithm. Once registered, these labels provide a means of identifying relevant sections of a study given the extracted anatomic phrases. For instance, spatial information about each problem may be used to identify relevant sections from the patient’s complete set of imaging data by determining which sections of a study depict the specified anatomy. For example, the phrase “inferior portion of left occipital lobe” is mapped to a certain region in our atlas, and only the sections that contain this labeled region are presented to the user. In addition, registering all of the patient’s images to a common atlas generates a uniform view of anatomic landmarks on the image, which facilitates the detection of change and the assessment of significant imaging features. This provides a cognitive advantage to the physician, who is more likely to discern change when comparing two images if they are both registered to the same atlas.

Existential Dimension.—

In medical reporting, physicians often qualitatively assess a level of certainty for the existence of a given problem. “Existence” refers to whether a problem is observed in the patient at a given time. Our NLP system is capable of characterizing the existence of a problem with one of the following values: “definitely exists,” “likely,” “possibly,” “less likely,” “cannot be ruled out,” “no evidence of,” and “does not exist.” These values can also be examined globally across various documents to determine how an instance of a problem relates to other occurrences. For example, given that a problem is mentioned in several clinical documents, each instance may be labeled as either new, recurrent, old, or resolved. Ultimately, existence provides a clinician with information about the status of a problem or finding, which implies whether a disease is present or not; our infrastructure has been set up as a basis for a more advanced decision support system that makes use of an organized list of problems or symptoms to suggest a diagnosis.

Causal Dimension.—

The notion of causality and clinical medicine are inherently intertwined. Patient care is driven by causal considerations: symptoms manifest due to an underlying cause, which in turn is the result of some (abnormal) biologic phenomenon. An important aspect of organizing clinical information is to capture the cause-effect relationships among variables of interest, such as treatments, exposures, preconditions, and outcomes. In conjunction with the existential dimension, causality is the basis on which a physician can determine the existence of a disease; diagnosis is a conclusion that is reached by analyzing a patient’s symptoms or problems. Drawing on the relationships it has identified, our NLP system identifies causal links between concepts (eg, problems to findings, medications to disease response) and provides this information to a reasoning engine such as a Bayesian belief network for clinical decision support.

Information Visualization

Figure 4 depicts a screen capture of our interface that draws upon the different dimensions of information to populate an integrated display. The interface allows visualization of both textual information and imaging data. The system supports (a) multiple levels of detail (depth vs breadth) for efficient navigation; (b) several dimensions of information (temporal, spatial, existential, and causal); and (c) various data types (text reports, numeric information, and imaging data). The goal is to provide an interface that improves the understanding of underlying data patterns (eg, co-occurrence, confounding, spatial location, and change) in the four aforementioned information dimensions. A user interacts with the system as follows:

Figure 4a.

Figure 4a

Figure 4a.  (a) Screen shot shows the application with its constituent components: case selected from the patient list by a user (a), hierarchic tree of problems and findings (b), list of selected findings (c), spatial information on selected findings (d), search box to filter results (e), color-coded cells representing existential information (f ), and a pop-up window that summarizes related documents and images for each finding (g). Red = finding confirmed, blue = finding unresolved, green = no evidence of finding. (b) Screen shots show how users are able to interact with the interface.

Figure 4b.

Figure 4b

Figure 4b.  (a) Screen shot shows the application with its constituent components: case selected from the patient list by a user (a), hierarchic tree of problems and findings (b), list of selected findings (c), spatial information on selected findings (d), search box to filter results (e), color-coded cells representing existential information (f ), and a pop-up window that summarizes related documents and images for each finding (g). Red = finding confirmed, blue = finding unresolved, green = no evidence of finding. (b) Screen shots show how users are able to interact with the interface.

  1. The patient case selector (Fig 4a, section a) provides the user with tools for finding and retrieving a patient case from a clinical information system or research database. Cases can be retrieved on the basis of patient identifier, demographics, or a set of attributes, allowing flexibility in finding all patient cases that match a set of criteria (eg, male patients with brain tumor).

  2. Selecting a patient case generates a hierarchic tree (Fig 4a, section b) of problems, findings, and anatomic locations extracted from the patient’s documents and presented in a top-down format. The current implementation uses American College of Radiology codes to organize problems and findings, beginning with broad anatomic categories (eg, “skull and contents [1]”) and continuing into more specific anatomic locations (eg, “occipital lobe [133]”). The goal is to aid clinicians in discerning relationships among problems, findings, and anatomic locations. Although other codifications may be used (eg, International Classification of Diseases codes [ICD-9]), the American College of Radiology system was chosen because it is familiar to radiologists, who commonly use it to provide context prior to examining a patient’s imaging studies.

  3. The user may select one or more elements from the hierarchic tree to populate the timeline display (Fig 4a, section c), a color-coded grid akin to depictions used for microarray data that provides an overview of extracted problems, findings, anatomic descriptors, and existential descriptors in relation to time. The user has the option of selecting individual findings, all findings related to a given problem, or all problems found in a particular anatomic region.

  4. Selected findings can be sorted on the basis of anatomic location (Fig 4a, section d). For instance, if the user wants to know what problems are currently affecting a particular organ (eg, brain), he or she can filter the list such that only problems and findings located in the brain are displayed. The interface also provides a search box for filtering the list of problems and findings on the basis of keywords (Fig 4a, section e). Keyword search provides an alternative way of exploring the patient record if the user has a specific concept in mind; the timeline is populated on the basis of specific keywords provided by the user, thereby obviating navigation of the hierarchic tree.

  5. Cells in the timeline are color coded to show existential information (Fig 4a, section f ). A red cell represents a finding that has been confirmed with some study, a green cell indicates no evidence of the finding, and a blue cell indicates that the finding is still unresolved. The color coding provides users with a quick method of viewing and understanding what findings exist in documents and how findings change across documents and time.

When the user places the cursor over a cell in the timeline, a pop-up window is immediately displayed, showing the sentences describing a particular finding in a report within that time frame (Fig 4a, section g). The pop-up window is shown in greater detail in Figure 5. If an imaging study is available for the selected finding, the image corresponding to the anatomic description related to the finding appears. The pop-up window gives the user a quick means of summarizing relevant information about a given problem. The window includes buttons that allow the user to access the full report as well as the full imaging study. The imaging study is rendered in a “film-strip” format, which allows the user to browse through the entire set of images at a glance (Fig 6). Different imaging studies for a given medical problem can be viewed simultaneously, allowing the user to view changes in imaging features over time. Such a comparison is useful in evaluating whether a particular treatment is effective. In addition, the pop-up window provides a context-sensitive button that links the user to relevant external resources (eg, PubMed, Medline). This functionality is achieved with use of an “InfoButton-style” manager (40).

Figure 5.

Figure 5

Figure 5.  The pop-up window provides a summary of the selected finding by displaying the sentence from the clinical document that refers to the finding, along with links to the related imaging study (if available) (a), the full report (b), and an option to search an external resource (c).

Figure 6.

Figure 6

Figure 6.  Full imaging study displayed in a filmstrip format, which is accessed by clicking on the image in the pop-up window.

Implementation

A set of 44 records of patients with primary brain tumors (for which approval from the institutional review board has been obtained) were identified from the UCLA (University of California at Los Angeles) Neuro-oncology Research Database as the test set for our system. We used the open source eXtensible Markup Language (XML) gateway, DataServer, to interface with our hospital’s database system (41). DataServer is “middleware”—situated between clients and traditional hospital information systems, radiology information systems, and picture archiving and communication systems—that aggregates patient medical records and outputs this information as a structured XML representation. By using this interface, we retrieved all documents from the hospital information system for patients who were diagnosed with primary brain tumors. These documents included several types of medical reports, including radiology reports, pathology reports, discharge summaries, surgery reports, and radiation oncology reports. Because the entire medical record is retrieved, certain documents may be irrelevant, falling outside specific selection criteria (eg, documents more than 10 years old). To identify the relevant documents, a temporal filter was used to gather all the documents within a given time frame, with the remaining documents being discarded. On a second pass, we used a multinomial naïve Bayes classifier that was previously trained to identify brain tumor–related reports. This filtering is performed to help further distinguish between brain tumor–related problems (eg, headache caused by the tumor) and problems mentioned during previous visits that are unrelated to the present condition (eg, headache caused by a sinus infection).

Once the relevant reports are identified, they are inputted to the NLP system. The system parses a given sentence and provides (a) identification of a reference to a finding or disease process, (b) a location description of the finding, (c) an existence description of the finding, (d) temporal modifiers of the finding, and (e) associations with other findings (eg, causal, differential interpretations). The various individual modules of the NLP system have different performance accuracies, some of which have been previously evaluated (38,42). Specific to this application, we evaluated the Bayes filter, entity extractor, and discourse analyzer. The Bayes filter had precision and recall scores of 96% and 94%, respectively, for the task of identifying brain tumor–related reports from a collection of assorted reports when using a combination of word-level and phrase-level features (43). In identifying clinical abnormalities, the entity extractor had precision and recall scores of 87% and 96%, respectively. In identifying anatomic phrases, the system had precision and recall scores of 97.4% and 97.1%, respectively. For performance of coreference resolution, the system had precision and recall scores of 72% and 63.1%, respectively (determined with use of a tenfold cross-validation design). These results constitute a preliminary evaluation of the back end of the system; a full-scale evaluation that examines the validity and relevancy of the extracted concepts will be performed in the future. The output of the NLP system is an XML file containing structured information that has been extracted by parsing the medical record. This XML file acts as an input to the visualization engine, which displays the final output as shown in Figure 4.

The interface was developed as a Java application and tested in a limited research context at UCLA. We also demonstrated the system as an education exhibit at the 2007 Annual Meeting of the Radiological Society of North America. The implementation elicited positive feedback from users, who tried the system on a small set of deidentified patient reports. Users appreciated the concept of a real-time pop-up window that summarizes patient information and is accessed by pointing at a particular finding and date. Feedback from UCLA physicians concerning the information organization and information visualization modules was routinely obtained during the development phase. Formal usability studies and evaluation of the effectiveness of the system as an aid to clinical practice, teaching, and research are some of our next goals for this project.

Discussion

We faced challenges in each of the areas of summarizing clinical data. The most significant problem in information extraction was coreference resolution, a well-documented problem in NLP. Tracking a single finding across multiple paragraphs and documents required extensive work in classifier design to achieve a reasonable level of performance accuracy. In the area of information organization, we needed a method of classifying problems and findings into semantically related groups, which dictate where and how they are displayed to the user. We developed the groupings (ie, dimensions) on the basis of the types of information that would aid a clinician in developing a differential diagnosis. Finally, the challenges in the area of information visualization were primarily identifying the various information needs of the physician and prioritizing them so that the most important features would be readily displayed and additional features would be displayed upon request (eg, mouse clicking or hovering).

There are several areas for improvement in our current implementation. On the back end, the NLP system can be further refined by improving the accuracy of the individual modules. Improving feature selection by either (a) identifying a subset of features that are highly discriminating or (b) using a more powerful classifier such as support vector machines instead of the naïve Bayes classifier would enhance the performance of the extractor. In addition, we are continually exploring new approaches to improving the process of discourse analysis and temporal resolution. For instance, establishing a more robust set of rules to handle pronoun-verb coreferences (which appear as false-positive findings in our current implementation) as well as refining the characterization of location (by incorporating relative locations between findings whose general locations are identical) would improve the sensitivity of our system.

Although we have presented our system in the context of neuro-oncology, we are also evaluating the system in the domain of thoracic radiology. The core NLP back end was developed and trained with use of radiology reports from multiple domains; therefore, similar accuracies of performance are expected. In addition, the types of knowledge represented in the information model are not specific to radiology and may be adapted for other departments. Overall, the system can be generalized and applied to other clinical domains.

Because of the increased emphasis on the adoption of structured reporting, it is worthwhile to discuss the impact of structured reporting on our project. Current approaches to structured data entry incorporate (a) grammar normalization (ie, the form-based approach), (b) data representation (ie, the fields of the form are predefined), and (c) concept normalization (values for any particular form slot are precompiled and the choices limited). The design requirements for structured reporting were outlined 35 years ago (44). Some success has been achieved in limited domains, including radiology (45,46), pathology (47), and progress notes (48). The positive aspects of form-based reporting are threefold: (a) consistent coverage, since a form-based system can help enforce physician reporting consistency (47); (b) decision support, since forms can provide important reminders of possible states or interpretations for a finding; and (c) compliance, ensuring that the resulting reports meet regulatory or accreditation standards and institutional requirements. Despite these benefits, however, widespread adoption has not occurred, in part due to reporting bias (ie, pigeon-holing a diagnosis or description into an available selection choice), lack of expression (47), negligence in reporting (ie, a tendency not to report unusual conditions) (49), labor-intensive and time-consuming tasks (eg, traversing hierarchic menus), creation of new cognitive tasks, and problems with updates (eg, introduction of new vocabularies). In our system, an NLP engine is used primarily to process free-text reports and convert them into a structured format. Although our approach allows clinicians the flexibility to continue generating reports using their preferred method without having to adapt to predefined forms, adoption of structured reporting is likely to benefit our project, since the requirement for preprocessing of data would be reduced.

Summary

We have presented a system that addresses the issues currently limiting the use of computerized POMRs in clinical practice. Although previous studies have described methods of performing an individual task (eg, information extraction, organization, or visualization), we have demonstrated a single system that effectively performs all three steps. We developed our system with the goal of facilitating the clinician’s ability to better understand a patient’s clinical history and make optimal decisions about treatment.

At the core of the system is an NLP component that extracts clinical concepts such as medical problems, findings, and anatomic locations and their relationships from free-text documents.

The system organizes these extracted concepts along four dimensions of information: spatial, temporal, existential, and causal.

With use of the anatomic locations extracted from the text, relevant imaging studies are identified and linked to the clinical documents by determining which images in the study contain the anatomy of interest. The interface initially provides an overview of the patient record, but as the user interacts with individual elements of the display, the interface provides additional details about the selected component. For instance, users who are interested in a certain finding associated with a particular medical problem may discover where the finding is mentioned within the corpus of documents and view imaging sections that relate to that problem. The system can be integrated with existing clinical information systems, since it supports the ability to parse XML documents (eg, HL7, CDA) and DICOM (Digital Imaging and Communications in Medicine) images. The system is also modular; additional support for other standardized communication protocols may be added in the future. Given the growing amount of data being captured as a part of routine clinical care, physicians need tools that will assist them in effectively navigating and understanding this information in a short amount of time.

Acknowledgments

The authors would like to thank Suzie El-Saden, MD, for her valuable feedback, and Lewellyn Andrada, BS, for his assistance on this project.

Abbreviations

  • NLP = natural language processing

  • POMR = problem-oriented medical record

  • XML = eXtensible Markup Language

Funding: The research was supported in part by the National Institutes of Health [grant number P01-EB00216] the National Library of Medicine [grant number T15-LM007356].

References

  • 1.Smith R. What clinical information do doctors need? BMJ 1996;313(7064):1062–1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Fletcher RH, O’Malley MS, Fletcher SW, Earp JAL, Alexander JP. Measuring the continuity and coordination of medical care in a system involving multiple providers. Med Care 1984;22(5):403–411. [DOI] [PubMed] [Google Scholar]
  • 3.Roberts M, Newhouse J, Gingrich N, Magaziner I. What will change American healthcare in the 21st century? I, II. Harvard Conference on American Healthcare, 2000.
  • 4.Merry P. Healthcare information: slow to learn. Health Serv J 1997;107(5563):28–29. [PubMed] [Google Scholar]
  • 5.Covell DG, Uman GC, Manning PR. Information needs in office practice: are they being met? Ann Intern Med 1985;103(4):596–599. [DOI] [PubMed] [Google Scholar]
  • 6.Green ML, Ciampi MA, Ellis PJ. Residents’ medical information needs in clinic: are they being met? Am J Med 2000;109(3):218–223. [DOI] [PubMed] [Google Scholar]
  • 7.Weed LL. Medical records that guide and teach. N Engl J Med 1968;278(11):593–600. [DOI] [PubMed] [Google Scholar]
  • 8.Salmon P, Rappaport A, Bainbridge M, Hayes G, Williams J. Taking the problem oriented medical record forward. Proc AMIA Annu Fall Symp 1996:463–467. [PMC free article] [PubMed]
  • 9.Bui AA, Taira RK, El-Saden S, Dordoni A, Aberle DR. Automated medical problem list generation: towards a patient timeline. Stud Health Technol Inform 2004;107(pt 1):587–591. [PubMed] [Google Scholar]
  • 10.Bossen C. Evaluation of a computerized problem-oriented medical record in a hospital department: does it support daily clinical practice? Int J Med Inform 2007;76(8):592–600. [DOI] [PubMed] [Google Scholar]
  • 11.Shneiderman B. The eyes have it: a task by data type taxonomy for information visualizations. In: IEEE Proc Symp Vis Lang 1996; 336–343.
  • 12.Jain NL, Friedman C. Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. Proc AMIA Annu Fall Symp 1997:829–833. [PMC free article] [PubMed]
  • 13.Xu H, Anderson K, Grann VR, Friedman C. Facilitating cancer research using natural language processing of pathology reports. Stud Health Technol Inform 2004;107(pt 1):565–572. [PubMed] [Google Scholar]
  • 14.Mendonca EA, Haas J, Shagina L, Larson E, Friedman C. Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform 2005;38(4):314–321. [DOI] [PubMed] [Google Scholar]
  • 15.Zhou L, Friedman C, Parsons S, Hripcsak G. System architecture for temporal information extraction, representation, and reasoning in clinical narrative reports. AMIA Annu Symp Proc 2005:869–873. [PMC free article] [PubMed]
  • 16.Meystre S, Haug P. Automation of a problem list using natural language processing. BMC Med Inform Decis Mak 2005;5(1):30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Huang Y, Lowe HJ, Klein D, Cucina RJ. Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS Specialist lexicon. J Am Med Inform Assoc 2005;12(3):275–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sneiderman CA, Rindflesch TC, Bean CA. Identification of anatomical terminology in medical text. Proc AMIA Symp 1998:428–432. [PMC free article] [PubMed]
  • 19.Taira RK, Bui AA, Kangarloo H. Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp 2002:757–761. [PMC free article] [PubMed]
  • 20.Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. Evaluation of negation phrases in narrative clinical reports. Proc AMIA Symp 2001:105–109. [PMC free article] [PubMed]
  • 21.Mutalik PG, Deshpande A, Nadkarni PM. Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS. J Am Med Inform Assoc 2001;8(6):598–609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp 2001:17–21. [PMC free article] [PubMed]
  • 23.Bashyam V, Divita G, Bennett DB, Browne AC, Taira RK. A normalized lexical lookup approach to identifying UMLS concepts in free text. Stud Health Technol Inform 2007;129(pt 1):545–549. [PubMed] [Google Scholar]
  • 24.Zeng Q, Cimino JJ. A knowledge-based, concept-oriented view generation system for clinical data. J Biomed Inform 2001;34(2):112–128. [DOI] [PubMed] [Google Scholar]
  • 25.Portoni L, Combi C, Pinciroli F. User-oriented views in health care information systems. IEEE Trans Biomed Eng 2002;49(12):1387–1398. [DOI] [PubMed] [Google Scholar]
  • 26.De Clercq E. Problem-oriented patient record model as a conceptual foundation for a multi-professional electronic patient record. Int J Med Inform 2008;77(9):565–575. [DOI] [PubMed] [Google Scholar]
  • 27.Chittaro L. Information visualization and its application to medicine. Artif Intell Med 2001;22(2): 81–88. [DOI] [PubMed] [Google Scholar]
  • 28.Card SK, Mackinlay JD, Shneiderman B. Readings in information visualization: using vision to think. San Francisco, Calif: Morgan Kaufmann, 1999.
  • 29.Kosara R, Miksch S. Visualization methods for data analysis and planning in medical applications. Int J Med Inform 2002;68(1–3):141–153. [DOI] [PubMed] [Google Scholar]
  • 30.Lavrac N, Bohanec M, Pur A, Cestnik B, Debeljak M, Kobler A. Data mining and visualization for decision support and modeling of public healthcare resources. J Biomed Inform 2007;40(4):438–447. [DOI] [PubMed] [Google Scholar]
  • 31.Cousins SB, Kahn MG. The visual display of temporal information. Artif Intell Med 1991;3(6): 341–357. [Google Scholar]
  • 32.Plaisant C, Milash B, Rose A, Widoff S, Shneiderman B. LifeLines: visualizing personal histories. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, 1996; 221–227.
  • 33.Spitzer VM, Whitlock DG. The Visible Human Dataset: the anatomical platform for human simulation. Anat Rec 1998;253(2):49–57. [DOI] [PubMed] [Google Scholar]
  • 34.IBM Zurich Research Labs. IBM Research unveils 3-D avatar to help doctors visualize patient records and improve care. IBM Press Release, 26 September 2007. Available at: http://www.zurich.ibm.com/news/07/asme.html. Accessed April 15, 2008.
  • 35.Bui AA, Aberle DR, Kangarloo H. TimeLine: visualizing integrated patient records. IEEE Trans Inf Technol Biomed 2007;11(4):462–473. [DOI] [PubMed] [Google Scholar]
  • 36.Taira RK, Soderland SG. A statistical natural language processor for medical reports. Proc AMIA Symp 1999:970–974. [PMC free article] [PubMed]
  • 37.Taira RK, Soderland SG, Jakobovits RM. Automatic structuring of radiology free-text reports. RadioGraphics 2001;21(1):237–245. [DOI] [PubMed] [Google Scholar]
  • 38.Taira RK, Bashyam V, Kangarloo H. A field theory for medical natural language processing. IEEE Trans Inf Technol Biomed 2007;11(4):364–375. [DOI] [PubMed] [Google Scholar]
  • 39.Son RY, Taira RK, Kangarloo H, Cárdenas AF. Context-sensitive correlation of implicitly related data: an episode creation methodology. IEEE Trans Inf Technol Biomed 2008;12(5):549–560. [DOI] [PubMed] [Google Scholar]
  • 40.Cimino JJ. An integrated approach to computer-based decision support at the point of care. Trans Am Clin Climatol Assoc 2007;118:273–288. [PMC free article] [PubMed] [Google Scholar]
  • 41.Bui AA, Dionisio JD, Morioka CA, Sinha U, Taira RK, Kangarloo H. DataServer: an infrastructure to support evidence-based radiology. Acad Radiol 2002;9(6):670–678. [DOI] [PubMed] [Google Scholar]
  • 42.Cho PS, Taira RK, Kangarloo H. Automatic section segmentation of medical reports. AMIA Annu Symp Proc 2003:155–159. [PMC free article] [PubMed]
  • 43.Bashyam V, Morioka C, El-Saden S, Bui AA, Taira RK. Identifying relevant medical reports from an assorted report collection using the naive Bayes classifier and the UMLS. Indian J Med Inform 2007;2 (1):1–8. [PMC free article] [PubMed] [Google Scholar]
  • 44.Mani RL, Jones MD. MSF: a computer-assisted radiologic reporting system. I. Conceptual framework. Radiology 1973;108(3):587–596. [DOI] [PubMed] [Google Scholar]
  • 45.Kopans DB. Standardized mammography reporting. Radiol Clin North Am 1992;30(1):257–264. [PubMed] [Google Scholar]
  • 46.Bidgood WD. Clinical importance of the DICOM structured reporting standard. Int J Card Imaging 1998;14(5):307–315. [DOI] [PubMed] [Google Scholar]
  • 47.Branston LK, Greening S, Newcombe RG, et al. The implementation of guidelines and computerised forms improves the completeness of cancer pathology reporting: the CROPS project—a randomised controlled trial in pathology. Eur J Cancer 2002;38(6):764–772. [DOI] [PubMed] [Google Scholar]
  • 48.Weir CR, Hurdle JF, Felgar MA, Hoffman JM, Roth B, Nebeker JR. Direct text entry in electronic progress notes. Methods Inf Med 2003;42(1):61–67. [PubMed] [Google Scholar]
  • 49.Moorman PW, van Ginneken AM, Siersema PD, van der Lei J, van Bemmel JH. Evaluation of reporting based on descriptional knowledge. J Am Med Inform Assoc 1995;2(6):365–373. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Radiographics are provided here courtesy of Radiological Society of North America

RESOURCES