Abstract
The Clinical Outcomes Assessment Toolkit (COAT) was created through a collaboration between the University of California, Los Angeles and Brigham and Women's Hospital to address the challenge of gathering, formatting, and abstracting data for clinical outcomes and performance measurement research. COAT provides a framework for the development of information pipelines to transform clinical data from its original structured, semi-structured, and unstructured forms to a standardized format amenable to statistical analysis. This system includes a collection of clinical data structures, reusable utilities for information analysis and transformation, and a graphical user interface through which pipelines can be controlled and their results audited by nontechnical users. The COAT architecture is presented, as well as two case studies of current implementations in the domain of prostate cancer outcomes assessment.
Introduction
The release of the Institute of Medicine's (IOM) series of reports on disparities in the quality of health care and their effects 1,2 has led to a greater emphasis on the analysis of key quality measures in both national 3 and individual studies. 2,4 Because of availability, the most common source of quality measures is data abstracted from existing medical records. 5–7 Despite unanswered questions regarding the quality of medical record data and the consequent effect on efforts to measure the quality of care, 8–11 the use of the medical record as a source of quality measures is expected to increase. 12,13 As a result, so too will the need for automated systems capable of supplementing, if not replacing, traditional methods of manual chart review and abstraction. 14
Although several automated approaches have proven capable of extracting individual values from clinical records, formidable challenges remain in realizing reliable and quick access to current clinical outcomes and performance measurement data. First, some degree of customization is often required to account for the sublanguage and documentation practices of specific medical subdomains. This can be a costly and resource-intensive endeavor. Both the cost and the technical complexity of these tasks can increase considerably in attempts to extract data from the records of multiple institutions. Beyond abstracting information from the record, clinical records-based research poses logistical challenges, including the import of data from multiple sources into a single repository, the standardization of that data for statistical analysis, and the auditing of extracted results to guarantee data integrity.
The Clinical Outcomes Assessment Toolkit (COAT) was designed to explore the challenges related to the use of automation to facilitate medical records–based clinical outcomes assessment and performance measurement research. Created as part of an ongoing collaboration between the University of California, Los Angeles (UCLA) and the Center for Surgery and Public Health at Brigham and Women's Hospital (BWH), COAT provides developers a collection of clinical data structures and reusable functionalities that can be rapidly assembled to create information pipelines for importing, extracting, standardizing, and analyzing clinical data. The use of a consistent set of clinical data structures enables integration of assembled pipelines with a user interface, allowing nontechnical end users to control, display, and audit the results of instantiated pipelines. This article presents the COAT architecture, the details of its implementation, and an overview of two ongoing clinical outcomes assessment efforts supported by COAT.
Background
Nearly all healthcare stakeholders have expressed an increased need for measures to assess the quality of delivered care. Patients seeking to make informed decisions and accustomed to the instant information delivery of the World Wide Web expect provider-specific clinical outcomes data. 15 Major third-party health care reimbursement organizations are requiring outcomes information to ensure that their standards of quality and cost-effectiveness are met. 16 In some cases, hospital administrators are proactively gathering clinical data to demonstrate to insurers the appropriateness of the care delivered. In the United Kingdom, a recently deployed physician reimbursement system will tie 30% of physicians' salaries to their performance as defined by 130 quality indicators. 13 Indications from the IOM and United States–based insurers are that clinicians will soon see more of their compensation similarly tied to such quality measures. 17,18 Currently Medicare publishes information for the public to assess hospital performance, 19 and a recent court ruling in the United States will require the U.S. Department of Health and Human Services to make public physician-specific information as well. 20 This growing emphasis on health care quality measurement has placed a premium on systems capable of identifying and extracting these measures from the clinical record.
The complexity and heterogeneity of medical information has led to the application of a wide variety of approaches for automated information extraction. In cases in which targeted diagnoses appear within narrative free text, statistical approaches such as support vector machines and entropy-based techniques have proven useful. 21,22 In attempting to extract values that appear in predictable patterns, such as patient demographics and certain laboratory results, researchers have had success in using predetermined patterns of strings or regular expressions. 23,24 In several cases, robust rules and grammar-based natural language processing (NLP) systems such as MedLEE 25 and the National Library of Medicine's MetaMap Transfer (MMTx) 26 have been used as the foundation for clinical information extraction efforts. 27,28 Despite an increase in research focused on developing and evaluating techniques for processing clinical free text, the heterogeneity of the clinical data environment and the diverse needs of clinical researchers have precluded methodological consensus. In contrast, the recently published results of the Informatics for Integrating Biology and the Bedside (i2b2)'s shared NLP challenge to extract smoking status from discharge reports show the potential of a variety of different approaches for a given extraction task. 29
A useful development to support the experimentation and evaluation of text processing techniques is the growing availability of downloadable toolkits. These free and often open-source systems make possible the integration of existing information processing functionalities to create custom pipelines of components. This approach to the design of clinical text processing systems allows developers the convenience and flexibility of choosing the most appropriate technique for the clinical information problem at hand without developing such functionalities from scratch. For example, systems such as HITEx 30 and caTIES 31 integrate third-party software to accomplish certain tasks, including the General Architecture for Text Engineering (GATE) for NLP 32 , Weka for machine-learning, 33 and MetaMap and/or the Unified Medical Language System (UMLS) for concept mapping. In consideration of the complexity of clinical data 34 and the growing list of targeted quality measures, 35 this hybrid approach represents an important step toward striking a balance between reusability and achieving acceptable levels of performance through customization.
The task of extracting relevant information from free text, although fundamental, is only one prerequisite for the conduct of record-based clinical research. The process of gathering data, formatting it to a standard representation, and stratifying it into samples for analysis can consume considerable resources in medical records–based research. Another important and persistent issue in clinical record–based research is data quality. 8,36,37 With attempts to automate the identification of variables, an important layer of quality assurance provided by the abstractor is removed. These considerations take on additional importance when clinical questions are answered using data from multiple institutions and when the questions must be re-answered on a periodic basis, as is often the case with physician and hospital performance measures that affect payment and accreditation.
System Description
The COAT Framework
COAT offers a framework with which developers can arrange and rearrange a series of components to create pipelines to import, extract, standardize, and analyze clinical data in its various formats. COAT is not an off-the-shelf application designed to implement a specific approach to extracting and structuring information from clinical free text. Instead, the COAT framework is intended to support the rapid development, experimentation, and evaluation of a variety of approaches, including natural language processing, machine learning, or fixed pattern-based approaches (e.g., regular expressions) in solitude or in combination with one another, for specific clinical information extraction tasks.
The term framework is used here to describe the three main aspects of COAT that are used in combination to support information extraction and analysis: (1) the clinical data objects, (2) pipelines of assembled components designed to advance data from its various locations and formats to a standardized representation for analysis, and (3) a user interface extending control of pipelines to nontechnical end users. The design of COAT and the relationship between these three framework components is illustrated through an example drawn from an ongoing effort to measure the quality of prostate cancer surgical care.
The typical research project designed to utilize extant clinical data begins with a list of values targeted for analysis. In the case of prostate cancer surgical outcomes assessment, researchers may require basic patient demographics (height, weight, age, etc.), as well as measures more specific to the clinical question or questions at hand (tumor stage, Gleason score, margin status). These values often exist in a number of clinical documents and in a variety of formats (e.g., structured demographic data, semistructured pathology data, etc). The project's medical record abstractors are responsible for reviewing a list of predetermined documents for each patient, extracting the targeted values from each document, and recording this value in a standardized format in a manner that maintains the association between the value and the patient (e.g., a case report form, database, spreadsheet, etc.). The clinical data objects, information pipelines, and user interface that compose the COAT framework were designed to support the process of gathering, abstracting, and auditing targeted clinical values by automating many of its aspects.
First, the clinical data objects represent the information required for records-based research. The three abstract object types that manage most of the data in a COAT-developed application are the Patient, Document, and Extraction objects. As in the clinical research environment, each Patient contains one or more Documents. In turn, each Document has the potential to contain one or more targeted Extractions. It is the role of the system, like the role of the abstractor, to identify Extraction objects within each Document object, convert them to a standardized format, and associate them with the appropriate Patient object.
Extending this environment to the example of prostate cancer surgical care, the Patient object in COAT is used to represent patients having undergone a prostatectomy. The patient's pathology report represents one type of Document object that contains values of interest for the study. One specific targeted value in such a study is the patient's tumor stage information. Extracted from the document, tumor stage is represented as an Extraction object. ▶ uses the example of prostate cancer surgical outcomes assessment to illustrate the use of the clinical data objects (i.e., Patient, Document, and Extraction).
Figure 1.
The COAT clinical data objects.
At the start of each record-based research project using COAT, the list of targeted values is translated by a software developer to a list of attributes of the Patient object. COAT information pipelines are custom assemblies of functionalities intended to process the Document objects of each Patient object, populating Extraction objects with targeted values (e.g., tumor stage). Each Extraction is then transformed to a predetermined standardized format, presented to the user for review, and then loaded as the value associated with the attribute of the Patient object. The result of a COAT pipeline is a set of Patient objects with values associated with each of their attributes (e.g., tumor stage = T2bNxMx). This set of patients and their associated values are then stored in a relational database for statistical analysis.
COAT offers several prefabricated connectors, functions, and utilities with which to assemble information pipelines. To capitalize on the existence of several useful open-source applications, COAT components include connector objects that ease integration with packages such as MMTx for mapping free text to clinical concepts using natural language processing, and Weka for data mining and machine learning tasks. The use of a consistent set of clinical data objects allows pipelines to be combined, split, extended, and reused, thereby minimizing custom development. Developers can also incorporate their own custom-built components into information pipelines by designing them to process COAT clinical data objects. ▶ provides an example of an information pipeline assembled from COAT's library of components to extract quality measures from the pathology reports of patients who have undergone prostatectomies.
Figure 2.
The components that compose a COAT-developed application.
The flow of clinical data objects through information pipelines is managed by nontechnical users through the COAT user interface. Users are presented with button-click summaries of extraction results, including lists of patients for which no extraction was made. Record-level views of Patient, Document, and Extraction data are exposed to the user through drill-down tables that can be sorted by columns. Once satisfied with the presented results, users can commit the formatted values to their associated Patient objects, and then load the results to a database for future analysis. The use of a consistent representation of clinical data objects (i.e., Patient, Document, and Extraction) offers two advantages to the design and implementation of the COAT user interface. First, users are provided with a consistent interface to control and review the results of extraction efforts, regardless of the clinical extraction task. Second, the consistent data model allows developers to incorporate an existing user interface and collection of prefabricated views, controls, and reporting features.
Technical Implementation
COAT is implemented through five main packages and integrated with a relational database. ▶ illustrates the role of each package and a relational database in a typical COAT-developed application. Each package and the database have been numbered for convenience of reference. The Clinical package (1) contains the clinical data structures. The components used to create pipelines are organized into two packages, the Utility package (2) and the Text package (3). The Pipeline package (4) stores finished pipelines. The COAT user interface is implemented through Java Swing-based classes in the Interface package (5). The packages and some of the classes they feature as well as the design of the research database (6) are discussed in greater detail in the next section.
The Clinical Package
The Clinical package contains the data structures used to store and carry clinical data through assembled pipelines. The three previously described high-level classes of the Clinical package are Patient, Document, and Extraction. For traditional hypothesis-driven research featuring a finite number of variables, developers can extend the Patient object to store the necessary attributes by adding attributes (e.g., blood pressure, a laboratory value, etc.). For more complex studies, such as hypothesis-generating data mining efforts in which volumes of different types of information are searched for patterns, COAT features a collection of subclasses of Patient and Document, including PatientPathology, PatientChemistry, PatientMicrobiology, DocumentPathology, and DocumentOperativeReport objects. More granular control over extracted information is provided through several subclasses of the Extraction object, including the SectionExtraction, SentenceExtraction, and WordExtraction objects. Additional custom subclasses can be added as child classes of Patient, Document, and Extraction objects. Data stored in subclasses of the clinical data objects (e.g., PatientChemistry) must be mapped back to attributes of their original parent objects (Patient, Document, Extraction) to store their values in the associated research database and to display values in the interface without custom development. Alternatively, because the interface is developed entirely of Java components and the database is a simple relational database, both can be extended to support subclasses should there be an advantage to storing and presenting data in more granularly defined representations.
The Utility Package
Both the Utility and the Text packages are collections of tools and data structures that can be used by developers to create information pipelines. For organization purposes, the Utility package houses tools used to import and export data across different sources such as files, databases, and third-party applications. Examples of utilities offered in this package and their intended uses are provided in this section.
The FileUtil class facilitates the import of data from various file formats. The DataAccess class manages database connectivity. The Database class communicates with the associated database and contains standard select, insert, and update statements for tables representing the clinical data structures (i.e., Patient, Document, Extraction). It also serves to encapsulate any custom queries written to support the needs of individual research projects. System progress and error logging are important aspects of data quality assurance provided by the ReportLog and ErrorLog classes. Their methods can be called to record any potential errors or missed values at any stage of a pipeline to the user interface or to a text file. The Utility package also provides classes to integrate and communicate COAT data structures with third-party applications such as Weka and MetaMap. For example, the ClassifierUtil class has methods that take clinical data objects as input, convert them to Weka data structures, apply Weka's machine-learning algorithms, and return the results as Extraction objects. The ClassifierUtil class also provides methods for training, updating, and storing machine-learning models as serialized objects for future use.
The Text Package
The Text package contains a series of classes that can be incorporated by developers to manipulate text data. Its classes feature data structures and functionalities to support text extraction and mining including string tokenizers, sentence/section boundary detectors designed for clinical documents, regular expression matching functions, and reusable regular expressions commonly used in the clinical domain. Data structures contained in the Text package include Word, Sentence, and Section objects to facilitate more granular analysis of document texts. In addition, collections of Document objects can be passed to the Text package's Dictionary and Histogram classes to establish lists of words and the numbers of their appearances per document and per collection. These classes and functions provide the foundations of term frequency-based approaches used by information-retrieval techniques, sequence based statistical models (e.g., Markovian models), and n-gram approaches for text analysis and classification. Examples of regular expressions currently featured in COAT include an implementation of a semistructured sentence boundary detector for parsing pathology reports, and expressions representing tumor stage and Gleason score values. The RegexUtil class also provides the ability to test regular expressions through a simple interface.
The Interface Package
The Interface package contains predefined Java Swing panels that can be inserted into the COAT user interface. In its current implementation, the user interface consists of two windows: the Dashboard for viewing data and system progress and the Control Panel for interacting with assembled pipelines. ▶ shows a configuration of the COAT user interface with custom subpanels added to the Control Panel to extract prostate cancer quality measures from pathology reports.
Figure 3.
The COAT user interface.
The Dashboard was designed to present a consistent view of the data, regardless of the specifics of the clinical research problem. This consistency is achieved through the use of the framework's clinical data objects (i.e., Patient, Document, Extraction). The three basic panels featured on the Dashboard for viewing and manipulating data are the Summary Panel, Progress Panel, and a Data Views Panel. The Summary Panel is used to display and print original documents, extracted patient data, and summary reports. The Progress Panel is a smaller read-only text area that can be called by the developer in pipelines to send system progress and error messages to the user. The Data Views Panel is a tabbed pane with three tabs. Each of the tabs contains a table featuring collections of one of the three clinical data objects: Patient, Document, and Extraction. Tables can be sorted by column, and a record-level review of each object is sent to the Summary Panel by clicking on any of the tables' rows. Many of the panels can be reused across most research efforts, such as the Sample Panel for loading and saving samples and the Classifier Panel for building, updating, applying, and evaluating machine-learning models. As new functionalities are added to pipelines, custom subpanels are added to the Control Panel for the end-user. ▶ shows the custom subpanels added to support the extraction of pathology values for prostatectomy quality measure extraction.
Figure 4.
Custom subpanels added for prostatectomy quality measure extraction.
The Pipeline Package
The Pipeline package stores assembled pipelines. Specifying a single package to store pipelines simplifies code reuse because created pipelines and subsections of pipelines can be reused and combined with other pipelines. This design also isolates custom pipeline development in a single location.
Research Database
The COAT research database can be implemented in any relational database software. In two ongoing studies, the COAT Research Database is implemented in Microsoft Access. Tables in the research database are designed to mirror the COAT clinical data structures. In the current implementation, the tables feature naming conventions to indicate the version of their contents. For example, original documents and patient information are stored in master Document and Patient tables (e.g., PATIENT_MASTER). For each individual study, subsamples of patients and their respective documents are drawn from the master tables and loaded into study-specific Patient and Document tables (e.g., PATIENT_STUDY1, DOCUMENT_STUDY1). The finished product of a pipeline is a populated Patient table with standardized values ready for export to a spreadsheet or statistical package for further analysis.
Although COAT has been designed to reduce development time, the nature of the toolkit approach requires a degree of programmatic assembly and customization. First, at the start of any study, targeted values must be translated into a list of attributes in the Patient objects and the associated columns must be added to the Patient table in the research database. Although many text processing components exist to support the creation of pipelines, each pipeline must be assembled to meet the specific extraction needs of a given project. This requires programmers with some familiarity with text processing techniques. Control of user-launched functions in the pipeline must be extended to the user through the addition of a button, or in the case of several new functions, a custom subpanel of buttons. Depending on the class and its functions, customization of some utility objects may also be required. For example, developers must specify a connection string in the DataAccess class to communicate with their research database. In such cases, the objects provided by COAT offer a scaffold onto which custom calls, parameters, and attributes can be added, encapsulating functionality and reducing from scratch development of data structures and methods common to clinical data text processing.
Case Studies
COAT-developed applications are currently in use for two research efforts, both in the domain of prostate cancer surgical outcomes assessment. The first project is an evaluation of the use of automated methods to gather key quality measures from surgical pathology reports. 38 The three measures targeted are Gleason score, tumor stage, and surgical margin status; the three measures designated as Category 1 quality measures by the American College of Surgeons for their empirical correlation to outcomes. 39,40 The FileUtil class from the COAT Text package was used to import different formats of medical record data from two hospitals. As the data from one of the hospitals was stored as XML documents, the XMLParser utility was customized to process the hospital's data model and load the values into COAT clinical data objects. Three fields representing the targeted variables were added to the Patient object and related Patient table in the research database, and three custom subpanels were added to the user interface as seen in ▶. Two different extraction approaches were used: one using regular expressions implemented in the RegExUtil and another combining regular expressions with support vector machines through the ClassifierUtil's integration with Weka. Extracted values were loaded into Extraction objects, presented to the user for review, standardized for statistical analysis, and loaded as attributes of their associated Patient objects. Documents with contradicting or no Extraction objects were automatically logged, presented, and manually reviewed through the user interface. A gold standard was created, and a custom functionality was added to the interface to compare the gold standard to the results of the created pipeline. Overall accuracies for extraction of Gleason score, tumor stage, and margin status were 99.7%, 99.1%, and 97.2%, respectively.
The second project using a COAT-developed application is in response to the growing demand for outcomes assessment research methods that shift the focus from the traditional use of indirect correlations, such as hospital or departmental volume, toward the identification and dissemination of the processes that lead to better outcomes. 41–45 Unlike most efforts to assess health care quality that begin with a list of known clinical values to target from reports, this experimental study used an inductive term frequency-based approach to identify correlates to outcome from within free text narrative operative reports. The results of the previously described pathology extraction pipeline were used to capture before-and-after data from the pathology reports. Similarly, the import functionality of the previous pipeline was used to import surgical operative reports from the two hospitals. Several classes from the Text package were used, including the Dictionary and Word objects, to create histograms of word appearances and to calculate term weights based on appearance. Ranking algorithms were implemented using the attributes of Word objects to calculate relevance. Sentences containing highly ranked patterns were presented through the user interface as Extraction objects.
Several hypotheses were generated regarding process-based correlates to outcome, and unexpected differences in the surgical practices of urologists were discovered. Significant variations among the practices of three surgeons included the number of bladder neck reconstructions required (p < 0.00), the nerve-sparing approach used (i.e., bilateral, unilateral, partial, nonnerve sparing) (p < 0.00), and the ordering of intraoperative frozen section specimen analyses (p < 0.01). The choice of nerve-sparing approach was correlated to the outcome positive surgical margins (p < 0.01 for partial, p < 0.02 for bilateral). Evaluating the clinical significance of the finding requires additional analysis. However, the ability to automatically identify patterns of interest in operative reports, the only process-based documentation widely available, is promising.
Conclusion
As increasing emphasis is placed on delivery of timely and accurate information useful for demonstrating the quality of health care, it will become necessary to use automated methods to produce this information. The COAT framework provides a workflow, collection of tools, and graphical user interface with which developers can create applications designed to support clinical records-based research. This toolkit and framework approach, rather than the adoption of a single method for information extraction, is intended to accommodate the heterogeneity of both clinical data and the information extraction needs of health services researchers while minimizing the development efforts of programmers.
There are limitations to the outlined approach. The decision to address the challenge of information extraction with a toolkit, rather than with a self-contained application, requires the need for some custom development. In addition, the system's design implies that no single approach to clinical information extraction is capable of meeting users' requirements, a hypothesis that may yet be disproved. Fortunately, as novel methods of extraction are introduced, COAT's modular format will support their incorporation.
The strategy outlined here has shown potential in applications to two record-based clinical research efforts using different extraction and analysis techniques for different types of clinical records originating from two different hospitals. The interoperability of the system's architecture implies that as COAT is extended to different medical subdomains, the number of existing functionalities, connectors, and data structures will grow, increasing the system's overall value to developers. Continued development of COAT is focused on refining the modularity and reusability of existing objects while adding and cataloging additional tools and functionalities.
Acknowledgments
The authors thank Drs. David Miller and Mark Litwin of UCLA and Drs. Jim Hu and Selwyn Rogers of Brigham and Women's Hospital for their clinical expertise.
Footnotes
This work was supported in part by National Library of Medicine Medical Informatics Training Grant LM07356 and National Institutes of Health grant R01 EB00362.
References
- 1.In: Corrigan J, Kohn L, Donaldson M, editors. To Err is Human: Building a Safer Health System. Washington, DC: National Academies Press; 1999. [PubMed]
- 2.Chassin M, Galvin R. The urgent need to improve health care quality: Institute of Medicine National Roundtable on Health Care Quality JAMA 1998;280:1000-1005. [DOI] [PubMed] [Google Scholar]
- 3.Mardon R, Shih S, Mierzejewski R, Halim S, Gwet P, Bost JE, National Committee for Quality Assurance The state of health care qualityWashington, DC: NCQA, Research and Analysis; 2002.
- 4.Jencks S, Cuerdon T, Burwen D, et al. Quality of medical care delivered to Medicare beneficiaries: a profile at state and national levels JAMA 2000;284:1670-1676. [DOI] [PubMed] [Google Scholar]
- 5.McDonald C. Quality measures and electronic medical systems JAMA 1999;282:1181-1182. [DOI] [PubMed] [Google Scholar]
- 6.Institute of MedicineCommittee on Improving the Patient RecordIn: Detmer D, Steen E, Dick R, editors. The Computer-Based Patient Record: An Essential Technology for Health Care. Washington, DC: National Academies Press; 1997. [PubMed]
- 7.Gilbert E, Lowenstein S, Koziol-McLain J, et al. Chart reviews in emergency medicine research: where are the methods? Ann Emerg Med 1996;27:305-308. [DOI] [PubMed] [Google Scholar]
- 8.Luck J, Peabody J, Dresselhaus T, Lee M, Glassman P. How well does chart abstraction measure quality?. A prospective comparison of standardized patients with the medical record. Am J Med 2000;108:642-649. [DOI] [PubMed] [Google Scholar]
- 9.Musen M. The strained quality of medical data Methods Inform Med 1989;28:123-125. [PubMed] [Google Scholar]
- 10.Peabody J, Luck J, Jain S, Bertenthal D, Glassman P. Assessing the accuracy of administrative data in health information systems Med Care 2005;42:1066-1072. [DOI] [PubMed] [Google Scholar]
- 11.Iezzoni L. Assessing quality using administrative data Ann Intern Med 1997;127:666-674. [DOI] [PubMed] [Google Scholar]
- 12.Institute of Medicine Rewarding provider performance: aligning incentives in MedicareWashington, DC: National Academies Press; 2006.
- 13.Shekelle P. New contract for general practitioners Br Med J 2003;326:457-458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Benin A, Vitkauskas G, Thornquist E, et al. Validity of using an electonic medical record for assessing quality of care in an outpatient setting Med Care 2005;43:691-698. [DOI] [PubMed] [Google Scholar]
- 15.Simmons B, Swiontkowski M, Evans R, Amadio P, Cats-Baril W. Outcomes assessment in the information age: available instruments, data collection, and utilization of data Instructional Course Lectures 1999:667-685. [PubMed]
- 16.Ritter MA. Overview: maintaining outcomes for total hip arthroplasty: the past, present, and future Clin Orthop Rel Res 1997;344:81-87. [PubMed] [Google Scholar]
- 17.Institute of Medicine Leadership by example: coordinating government roles in improving health care qualityWashington, DC: National Academies Press; 2002.
- 18.Pear R. Study tells U.S. to pay more for best medical careNew York Times; 2002. October 31. p A21.
- 19.Department of Health and Human Services Hospital CompareSept. 14 2007. www.hospitalcompare.hhs.gov 2007. Accessed Sept. 17, 2007.
- 20.Lee C. Medicare to Reveal Data About DoctorsWashington Post; 2007. Sept. 1,; p A03.
- 21.Bashyam V, Taira RK. Identifying Anatomical Phrases in Clinical Reports by Shallow Semantic Parsing Methods: IEEE Symposium on Computational Intelligence and Data Mining, Honolulu, HI, 1 March–5 April 2007, pages 210–214. DOI:10.1109/CIDM.2007.368874. Posted online: 2007-06-04.
- 22.Dreyer K, Mannudeep K, Hurier A, et al. Application of a recently developed algorithm for automatic classification of unstructured radiology reports: validation study Radiology 2005;234:323-329. [DOI] [PubMed] [Google Scholar]
- 23.Voorham J, Denig P. Computerized extraction of information on the quality of diabetes care from free text in electronic patient records of general practitioners J Am Med Inform Assoc 2007;14:349-354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Turchin A, Kolatkar N, Grant R, Makhni E, Pendergrass M, Einbinder J. Using regular expressions to abstract blood pressure and treatment intensification information from the text of physician notes J Am Med Inform Assoc 2006;13:691-698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Friedman C, Alderson P, Austin J, Cimino J, Johnson S. A general natural language text processor for clinical radiology J Am Med Inform Assoc 1994;1:161-174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Aronson A. Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program AMIA Symp 2001:17-21. [PMC free article] [PubMed]
- 27.Xu H, Anderson K, Grann VR, Friedman C. Facilitating cancer research using natural language processing of pathology reports MedInfo 2004:565-572. [PubMed]
- 28.Chapman WW, Fiszman M, Dowling JN, Chapman BE, Rindflesch TC. Identifying respiratory findings in emergency department reports for biosurveillance using MetaMap Medinfo 2004. 491. [PubMed]
- 29.Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records J Am Med Inform Assoc 2008;15:14-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zeng Q, Goryachev S, Weiss S, Sordo M, Murphy S, Lazarus R. Extracting principle diagnosis, co-morbidity, and smoking status for asthma research: evaluation of a natural language processing system BMC Med Inform Decis Mak 2006;6:30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Cancer Biomedical Informatics Grid. About caTIES. 2007. http://caties.cabig.upmc.edu/overview.html 2006. Accessed March 10, 2007.
- 32.Cunningham H. GATE, a general architecture for text engineering Comput Human 2004;36:223-254. [Google Scholar]
- 33.Witten I, Frank E. Data Mining: Practical Machine Learning Tools and Techniques2nd Edition. San Francisco: Morgan Kaufmann; 2005.
- 34.Cios K, Moore G. Uniqueness of medical data mining Artif Intell Med 2002;26:1-24. [DOI] [PubMed] [Google Scholar]
- 35.Tang P, Ralston M, Fernandez Arrigotti M, Qureshi L, Graham J. Comparison of methodologies for calculating quality measures based on administrative data versus clinical data from an electronic health record system: implications for performance measures J Am Med Inform Assoc 2006;14:10-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wu L, Ashton C. Chart review, a need for reappraisal Eval Health Prof 1997;20:146-163. [DOI] [PubMed] [Google Scholar]
- 37.Berg M, Goorman E. The contextual nature of medical information Int J Med Inform 1999;56:51-60. [DOI] [PubMed] [Google Scholar]
- 38.D'Avolio L, Litwin M, Rogers S, Bui A. Facilitating clinical outcomes assessment through the automated identification of quality measures for prostate cancer surgery J Am Med Inform Assoc 2008;15:341-348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Henson D, Hutter R, Farrow G, Task Force on the Examination of Specimens Removed from Patients with Prostate Cancer Practice protocol for the examination of specimens removed from patients with carcinoma of the prostate gland. A publication of the Cancer Committee, College of American Pathologists. Arch Pathol Lab Med 1994;118:779-783. [PubMed] [Google Scholar]
- 40.Srigley J, Amin M, Bostwick D, et al. Updated protocol for the examination of specimens from patients with carcinomas of the prostate glad: a basis for checklists Arch Pathol Lab Med 2000;124:1034-1039. [DOI] [PubMed] [Google Scholar]
- 41.Hannan E. The relation between volume and outcome in health care N Engl J Med 1999;340:1677-1679. [DOI] [PubMed] [Google Scholar]
- 42.Birkmeyer J. Understanding surgeon performance and improving patient outcomes J Clin Oncol 2004;22:2765-2766. [DOI] [PubMed] [Google Scholar]
- 43.Daly J. Invited commentary: quality of care and the volume-outcome relationship—what's next for surgery? Surgery 2002;131:16-18. [DOI] [PubMed] [Google Scholar]
- 44.Hannan E, Wu C, Ryan T, et al. Do hospitals and surgeons with higher coronary artery bypass graft surgeries still have lower risk-adjusted mortality rates? Circulation 2003;108:795-801. [DOI] [PubMed] [Google Scholar]
- 45.Berger D, Ko C, Spain D. Society of University Surgeons position statement on the volume-outcome relationship for surgical procedures Surgery 2003;134:34-40. [DOI] [PubMed] [Google Scholar]




