Abstract
Data in computer-based patient records (CPRs) have many uses beyond their primary role in patient care, including research and health-system management. Although the accuracy of CPR data directly affects these applications, there has been only sporadic interest in, and no previous review of, data accuracy in CPRs. This paper reviews the published studies of data accuracy in CPRs. These studies report highly variable levels of accuracy. This variability stems from differences in study design, in types of data studied, and in the CPRs themselves. These differences confound interpretation of this literature. We conclude that our knowledge of data accuracy in CPRs is not commensurate with its importance and further studies are needed. We propose methodological guidelines for studying accuracy that address shortcomings of the current literature. As CPR data are used increasingly for research, methods used in research databases to continuously monitor and improve accuracy should be applied to CPRs.
Data in computer-based patient records (CPRs) are used in patient care, clinical research, health-system management, health-services planning, total quality improvement, billing, risk management, and government reporting. The accuracy of these data is therefore of great importance. On the basis of inaccurate data, clinicians may make treatment errors,1 researchers may underestimate disease prevalence,2 health-system managers may underestimate compliance with standards of care such as vaccination guidelines,3 and alerting systems may send false alarms to physicians.4 It is therefore surprising that the amount of research devoted to measuring data accuracy in CPRs has been relatively small.
In contrast, there is extensive literature on data accuracy in paper-based records, disease registries, and clinical trial databases.5,6,7,8,9,10,11,12,13 This body of work provides a well-developed framework for the analysis of data accuracy that we can apply to CPRs. For example, Komaroff provides a comprehensive review of the complex processes by which different types of medical data are recorded into traditional paper-based medical records (and how error may be introduced at each step).5 His description of these processes can be expanded to describe how CPRs capture data (Fig. 1), thereby providing a model for the study of causes of inaccuracy. The literature on data accuracy in computer-based registries and clinical trial databases provides standard methods for the study of data accuracy,10,11,14 which are applicable to CPRs. In this literature, accuracy is calculated using two measures—one that measures the proportion of recorded observations in the system that are correct (correctness)* and a second that measures the proportion of observations that are actually recorded in the system (completeness) (Fig. 2). These measures are viewed as complementary; both measures are necessary for a complete understanding of accuracy in a system. From Figure 2, we can see that completeness decreases as the number of false negatives (cell c in Figure 2) increases, and correctness decreases as the number of false positives (cell b in Figure 2) increases. The false positives and negatives are largely independent of one another; thus, both measures provide valuable information about the accuracy of data in any system. Finally, previous work also contributes the understanding that the ideal gold standard for data accuracy is the true state of the patient (or more generally, whatever aspect of the world the data represents).10,14 This ideal is usually difficult, if not impossible, to achieve, and researchers have devised methods for approximating it. Different methods are preferred depending on the type of data under study and resources available to the researchers. For example, in cancer registries, biopsy data are generally preferred as a gold standard for measuring data accuracy as opposed to reports of radiographic studies or data abstracted from the paper-based patient record.10
Figure 1.
The variety of mechanisms by which historical facts, observations, and measurements flow into a CPR. Error can be introduced at any step.
Figure 2.
Correctness is the proportion of CPR observations that are a correct representation of the true state of the world and is calculated as a/(a + b)—equivalent to positive predictive value. Completeness is the proportion of observations made about the world that were recorded in the CPR and is calculated as a/(a + c)—equivalent to sensitivity. Negative predictive value and specificity can also be derived from this table but are not measured in data accuracy because the “d cell” may be infinitely large (it is unfeasible to count the number of observations that were not made that should not have been made).
We had three major objectives in conducting this review. First, we wanted to determine the quality of the literature on data accuracy in CPRs. Second, we wanted to form a synthesis of the results reported by this literature to answer the following open questions about data accuracy in CPRs:
How accurate are data contained in CPRs?
What are the causes of inaccurate data?
Which CPR characteristics influence data accuracy, and does direct clinician entry of data into the CPR result in higher rates of correctness and completeness than entry of data by third parties?
How can we improve data accuracy in CPRs?
Is the accuracy of CPR data higher than the accuracy of data in paper-based records?
Third, we wanted to provide methodological guidelines for researchers, quality improvement teams, and users of CPR data who are interested in performing and critiquing future studies of data accuracy.
Methods
Study Identification
In February 1996, we searched for published studies on data accuracy in CPRs using MEDLINE and CURRENT CONTENTS, conference proceedings, a citation index (SCISEARCH), and the reference sections of retrieved articles.
Because MEDLINE has no Medical Subject Heading (MeSH) for the concept of data accuracy, we constructed a textword search that retrieved citations containing at least one of the following words related to the concept of accuracy: accuracy, accurate, inaccuracy, inaccuracies, inaccurate, reliability, reliable, unreliability, unreliable, valid, validity, invalid, invalidity, correct, correctness, incorrect, incorrectness, complete, completeness, incomplete, incompleteness, error, erroneous, quality. We generated this list of words iteratively by performing a search, adding words that we found in citations, then repeating the search. We also required that articles be indexed under the MeSH term INFORMATION SYSTEMS.† We employed a second MEDLINE strategy to retrieve articles not indexed under INFORMATION SYSTEMS. This search retrieved articles containing at least one of the following phrases: data accuracy, accuracy data, data inaccuracy, inaccuracy data, inaccuracies data, data quality, quality data, data error, data errors, and erroneous data. We also searched CURRENT CONTENTS from October 1995 to February 1996 to identify articles not yet indexed by MEDLINE. We used our first MEDLINE strategy without the INFORMATION SYSTEMS restriction. Finally, we performed a citation search using SCISEARCH to identify articles that referenced an early review of the accuracy of medical data.5
One author (WRH) reviewed the tables of contents of all Proceedings of the Annual Symposium on Computer Applications in Medical Care (1977-1995) and the American Association for Medical Systems and Informatics Congress (1982-1989).
This same author (WRH) reviewed the titles and abstracts of citations retrieved by the previously mentioned searches, excluded citations obviously not relevant to data accuracy in clinical information systems (e.g., articles that presented data on air quality), and obtained copies of all remaining articles. WRH reviewed the references cited by these articles for additional articles.
These searches retrieved 2,443 citations from MEDLINE, approximately 800 citations from CURRENT CONTENTS, 35 citations from the citation search, approximately 2,500 citations from conference proceedings (citations of all papers from all conference proceedings were reviewed), and approximately 500 citations from reviewing references of retrieved articles. Manual review of the titles and abstracts of approximately 6,278 citations yielded 235 articles potentially relevant to data accuracy in CPRs.
Study Selection
We obtained the 235 articles and reviewed them to exclude articles that did not satisfy the following three criteria: (1) a CPR was the object of study, which we defined as a computer-based system that contains primary patient records, defined by the Institute of Medicine (IOM) as records... used by health care professionals while providing patient care services to review patient data or document their own observations15; (2) a gold standard to which computer records were being compared was stated; and (3) correctness or completeness, or data from which we could compute at least one of them was reported for at least one type of data. If an article described multiple studies, we included only those studies that met these criteria. Each author reviewed each of the 235 articles independently using a structured form. Inter-observer agreement was 92% for whether to include a given article. Differences of opinion were resolved by subsequent joint review of the article. Twenty articles satisfied our criteria for inclusion in this review.
Study Evaluation
We could find no standard method in the literature for critiquing studies about data accuracy. Thus, we developed an ad hoc scoring scheme to rate the articles.
Three of the journal articles reported the results of multiple studies of data accuracy; we scored each study from the 20 articles individually. There was a total of 26 different studies. The two authors independently scored each of these 26 studies on a scale of 0 to 18, using the scoring system described in the following paragraphs:
-
CPR description: 5 points. We awarded one point each for a description of methods of data capture, scope (e.g., how many clinics the CPR was operating in), general data content (i.e., what types of data the CPR contained), accessibility (e.g., from which locations a user can access the CPR, how much “down” time does the CPR experience), and whether the CPR constituted the official patient record.
The reason for scoring studies based on the description of CPR is that certain CPR characteristics may influence data accuracy. Therefore knowledge of these characteristics is necessary for the interpretation of results about data accuracy. For example, the method of data capture may influence accuracy—data captured on structured encounter forms or by direct clinician input may be more accurate than transcription of clinicians' unstructured, handwritten or dictated notes. The scope of the CPR might influence accuracy because patients often visit other clinics or health care providers who may not use the CPR, and thus the care rendered to them goes unrecorded in the CPR. The types of data contained in the CPR might influence rates of accuracy because certain types of data such as demographics and medication data are likely to be more accurate than other types, such as diagnoses or problem lists. Accessibility may influence accuracy because the times and locations from which data can be entered may be inconvenient or limited in number, and thus people may defer data entry rather than disrupt their work routine. This deferral of data entry may result in data either not being recorded or being recorded less accurately. Finally, if the CPR data are the official patient record, it is conceivable that those responsible for data entry will take more care when recording data.
Methodology: 12 points. We awarded two points for unbiased sampling techniques, including random selection or contiguous selection (e.g., all patients who visited the clinic during a predefined study period). We awarded four points if the members of the research team who were responsible for determining the gold standard were blinded to both the purpose of the study and the CPR data, and we awarded two points if they were blinded to either the study purpose or the CPR data. We awarded four points for gold standards that most closely approximated the true state of the patient, i.e., those determined by interview, examination, observation of patients, or an objective measurement; we awarded two points for gold standards determined by review of other patient data (e.g., the paper record). Finally, we awarded two points if both measures of accuracy—correctness and completeness—were measured.
Study Objective: 1 point. We awarded one point if the primary study objective was the measurement of data accuracy.
We resolved differences in the two scores for studies according to the following procedure: If our scores for a study differed by one point, we took the higher of the two. If our scores differed by two or more points, we reabstracted the information in question and, if necessary, jointly reviewed the article to resolve our differences.
Analysis
We expected significant variability in the CPRs studied and methods used to measure accuracy. Because of this variability, pooling of results would not be possible, and the techniques of formal meta-analysis would not be valid. Instead, we abstracted from each study (1) CPR characteristics, (2) methods, and (3) results about accuracy. We calculated methodological and description scores for the 26 studies and investigated the CPR characteristics and other results that pertained to the second objective of the review—answering questions about data accuracy.
Results
We found 20 articles that reported the results of 26 studies of accuracy in 19 unique CPRs (two articles described studies done in the same CPR). The 20 articles were published over a period spanning 18 years. Fourteen (70%) were published in the 5 years preceding this review. The primary objective of all but one study was the measurement of data accuracy in a CPR (Table 1).
Table 1.
Methodological Comparison of Studies
Study | Sampling | Gold Standard | na | Description | Methods | Objective | Total |
---|---|---|---|---|---|---|---|
Jelovsek/Hammond, 197816 | All for a 3-year period | Computer algorithm to identify blank fields | 7717 | 5 | 6 | 1 | 12 |
Fortinsky/Gutman, 198117 | Randomb | Paper records | 109 | 4 | 6 | 1 | 11 |
Jones/Hedley, 198618 | All for a 1-year period | Computer algorithm to identify blank fields | 1307 | 3 | 6 | 1 | 10 |
Maresh et al., 198619 | Consecutive deliveries of infants | Paper records | 253 | 4 | 4 | 1 | 9 |
Dambro/Weiss, 198820 | Random sample taken weekly | Committee review of paper recordsc | — | 4 | 4 | 1 | 9 |
Block/Brennan, 198921 | Study (1): randomd | (1) Paper records | (1) 388, 405 | 3 | 6 | 1 | 10 |
Study (2): all records for certain diagnoses | (2) Laboratory datac | (2) Not stated | 3 | 10 | 1 | 14 | |
Gouveia-Oliveira et al., 199122 | All diagnostic reports for certain diagnoses | No missing reports and no missing descriptors | 1925, 1565 | 4 | 6 | 1 | 11 |
Johnson et al., 19912 | All patients from selected practicesg | A well-established manual influenza surveillance system | — | 2 | 6 | 0 | 8 |
Kuhn et al., 199123 | All or random reports for certain diagnosesh | No missing descriptors | 210-642 | 4 | 2 | 1 | 7 |
Barrie/Marsh, 199224 | Random from 18-month period | Paper records | 200 | 4 | 8 | 1 | 13 |
Kuhn et al., 199225 | All or random reports for certain diagnosesi | No missing descriptors | 50, 52 | 3 | 4 | 1 | 8 |
Edsall et al., 199326 | Consecutive knee arthroscopies under general anesthesia | No missing observations for 46 data items | 5 | 4 | 4 | 1 | 9 |
Payne et al., 199327 | Study (1): not stated | (1) Patient observation | (1) 234 | 4 | 6 | 1 | 11 |
Study (2): all children of a certain age | (2) Paper records | (2) 218 | 4 | 6 | 1 | 11 | |
Study (3): all children of a certain age | (3) Paper records | (3) 104, 542 | 4 | 4 | 1 | 9 | |
Ricketts et al., 199328 | Sequential admissions | Paper records | 100 | 4 | 8 | 1 | 13 |
Barlow et al., 199428 | Random from 24-month period | Paper records | 200 | 3 | 8 | 1 | 12 |
Hohnloser et al., 199429 | All for 18-month period | Recording of original free text report | 1219 | 4 | 6 | 1 | 11 |
Wilton/Pennisi, 19943 | Consecutive visits of childlren <2 years old | Paper records | 2098 | 4 | 4 | 1 | 9 |
Pringle et al., 199530 | Study (1): all records for certain diagnosesj | (1) Paper chart plus CPR medication list | (1) Not given | 2 | 6 | 1 | 9 |
Study (2): all recordsj | (2) No missing fields | (2) Not given | 2 | 6 | 1 | 9 | |
Study (3): not statedj | (3) No missing fields | (3) 1000 | 2 | 6 | 1 | 9 | |
Study (4): consecutive visitsj | (4) Review of videotape of doctor taking history | (4) 50 | 2 | 6 | 1 | 9 | |
Yarnall et al., 199531 | Random from 2-year period | Paper records | 300 | 4 | 8 | 1 | 13 |
Wagner/Hogan, 199632 | All records for 3-week study period | Clinician interpretation of medication history + paper | 117 | 4 | 8 | 1 | 13 |
e A laboratory log book was used here as the gold standard for presence or absence of a disease. | |||||||
f Four types of lesions seen on endoscopy. |
Number of patient records that were reviewed.
Studied data accuracy before and after an intervention to improve the accuracy of data.
Study only evaluated data-entry error.
Every 20th chart in an alphabetical arrangement of paper charts.
Study included only those practices that could provide the total number of monthly visits and recorded at least one respiratory illness.
Six diagnoses were evaluated.
Two diagnoses were evaluated.
Study included four practices known to be among the best in recording CPR data.
The 26 studies varied in the quality of the description of the CPR and the quality of the methods used (overall score 8 to 14, mean 10.3 of 18; see Table 1). The descriptions of the CPRs were generally adequate (description score 2 to 5, mean 3.5 of 5); however, only one study reported all five CPR characteristics that we identified at the beginning of the review as potentially influencing data accuracy. The CPR characteristics most frequently reported were the scope and content of the CPR (26 studies) and methods of data capture (22 studies). Only 16 of the 26 studies provided the name of the CPR, or any other information about its hardware components or software versions. Finally, only three studies indicated whether CPR data constituted the official medical record.
Many studies had significant methodological weaknesses (methodological scores 4 to 10, mean 5.9 of 12). Common weaknesses were reporting only one measure of accuracy (14 studies) and failure to blind the members of the research team who were responsible for determining the gold standard to the purpose of the study or the CPR data (15 studies). The number of patient records sampled ranged from 5 to 7,717. Approaches to sampling were: inclusion of all CPR records or all records for single diagnoses (nine studies), random sampling (eight studies), and consecutive patients (nine studies). Most studies used inadequate gold standards. Thirteen studies employed unblinded review of paper-based records. In eight studies, investigators merely checked for blank fields or missing data elements rather than determining a gold standard that approximated the true state of the patient. Only three studies employed examination, interview, or observation of patients in the determination of the gold standard.
The 26 studies reported rates of correctness, completeness, or both for 35 types of data (Table 2). Completeness was reported for all 35 types. Correctness was reported for only 13 (37%) types. The rates of correctness and completeness showed high variability, even within types of data. Correctness and completeness of diagnoses—the most frequently studied type of data (13 studies)—ranged from 67 to 100% and from 30.7 to 100%, respectively. Certain types of data, such as immunization status, medications, and demographics, tended to be more accurate than other types, such as problem lists and complications of surgical procedures, but the lowest rates of accuracy for the “more” accurate types of data overlapped with the highest rates of accuracy for the “less” accurate types.
Table 2.
Correctness and Completeness for Various Data Types
Data Type | Study | Correctness (%) | Completeness (%) |
---|---|---|---|
Diagnoses/problem list | |||
Overall | Fortinsky/Gutman, 1981a | 94.4, 95.2 | 84.9, 89.8 |
Pringle et al., 1995 (Study 3) | — | 81.8 | |
Pringle et al., 1995 (Study 4) | — | 100 | |
Proteinuria | Maresh et al., 1986 | 95 | — |
Jones/Hedley, 1986 | — | 90.3 | |
Smoking status | Block/Brennan, 1989 (Study 1) | 90.9 | 30.7 |
Pringle et al., 1995 (Study 2) | 52.1 | ||
Pregnancy | Block/Brennan, 1989 (Study 1) | — | 92.3 |
Anemia—adults | Block/Brennan, 1989 (Study 2) | — | 54.0b |
Anemia—children | Block/Brennan, 1989 (Study 2) | — | 35.0b |
Urinary tract infection | Block/Brennan, 1989 (Study 2) | — | 54.8b |
Orthopedic diagnoses | Barrie/Marsh, 1992 | 96.5 | 58.7 |
Ricketts et al., 1993 | 67, 91 | 53, 74 | |
Codes for diagnoses | Fortinsky/Gutman, 1981a | 90.5, 91.9 | — |
Yarnall et al., 1995a | 77, 88 | 62, 82 | |
Hematology diagnoses | Hohnloser et al., 1994 | 74.1 | 54.5 |
Diabetes mellitus | Pringle et al., 1995 (Study 1) | 100 | 96.7 |
Glaucoma | Pringle et al., 1995 (Study 1) | 100 | 92.3 |
Asthma | Pringle et al., 1995 (Study 1) | — | 65.1 |
Coronary artery disease | Pringle et al., 1995 (Study 1) | — | 59.0 |
Medications/prescriptions | Pringle et al., 1995 (Study 3, 4) | — | 100 |
Wagner/Hogan, 1996 | 83 | 93 | |
Procedures/operations | |||
Overall | — | ||
Orthopedic | Barrie/Marsh, 1992 | 97.8 | 82.0 |
Ricketts et al., 1993 | 44, 86c | 43, 80c | |
Barlow et al., 1994 | 98 | 92.5 | |
Complications of procedures | |||
Overall | — | ||
Orthopedic | Barrie/Marsh, 1992 | 92.9 | 45.9 |
Ricketts et al., 1993 | 50, 77c | 17, 66c | |
Demographic data | |||
Overall | Jelovsek/Hammond, 1978 | — | 90.5d |
Occupation | Jelovsek/Hammond, 1978 | — | 34.4 |
Marital status | Jones/Hedley, 1986 | — | 96.6 |
Date of birth | Jones/Hedley, 1986 | — | 100 |
Immunization Status | |||
All vaccines | Payne et al., 1993 (Study 1) | 91.6 | 99.1 |
Payne et al., 1993 (Study 2) | 87.7 | 93.6 | |
Wilton/Pennisi, 1994 | 89.8 | 88.4 | |
MMR vaccine | Payne et al., 1993 (Study 3) | — | 86.0 |
Hib vaccine | Payne et al., 1993 (Study 3) | — | 90.2 |
Miscellaneous data types | |||
Historical data | Jelovsek/Hammond, 1978 | — | 78.7-99.1 |
Payne et al., 1993 (Study 3) | — | 20.7-40.5 | |
Payne et al., 1993 (Study 4) | — | 1.1-5.6 | |
Laboratory data | Jelovsek/Hammond, 1978 | — | 31.2-80.8 |
Influenza rates | Johnson et al., 1991 | — | 28.2 |
Endoscopy reports | Gouveia-Oliveira et al., 1991 | — | 81.3 |
Modifiers of pathologic findings | Gouveia-Oliveira et al., 1991 | — | 81.9 |
Kuhn et al., 1991 | — | 90.7 | |
Kuhn et al., 1992 | — | 100 | |
Exam | Pringle et al., 1995 (Study 3) | — | 27.2 |
Lower extremity exam | Jones et al., 1986 | — | 97.9-98.7 |
Anesthesia record | Edsall et al., 1993 | — | 87 |
Vital signs | Edsall et al., 1993 | — | 100 |
Alcohol use | Pringle et al., 1995 (Study 2) | — | 37.5 |
e Completeness of peak influenza rate as measured by taking the ratio of the influenza rate derived from CPR data to the influenza rate derived from the gold standard. | |||
f Not specified what criterion were used, such as whether only one or all components of exam had to be present. |
Two studies of data accuracy, one before and one after an intervention to improve the accuracy of data.
This diagnosis had laboratory criteria as the gold standard.
Rates for two separate hospitals using the same software package.
Median, the range of completeness for demographic data is 90.3-100%.
A study by the authors of this paper was the only one that investigated causes of errors as a primary study objective.32 In this study of medication data in an outpatient CPR, the most common cause of inaccuracy was the patient, who either provided incorrect information or created a discrepancy between true state and CPR data by changing medications without a clinician's instruction (36% of inaccuracies). The second most common cause was failure to capture medication changes made by clinicians who were not part of the clinic (26%), a result of CPR scope being limited to one clinic. Other causes included clinicians recording medication changes on paper but not in the CPR (13%) and clinic physicians making changes while outside the clinic and therefore recording them in neither the paper record nor the CPR (9%). Surprisingly, transcription error was a minor cause of inaccuracy (8%). The cause of 8% of inaccuracies could not be determined.
The 19 CPRs varied in ways that may influence data accuracy (Table 3). Data-capture mechanisms included transcription of data from paper-based records (10 CPRs), direct data entry by clinicians during patient care (6 CPRs), transcription of clinician dictation into the CPR (4 CPRs), and automatic capture of data from electronic patient-monitoring systems (1 CPR). Two CPRs captured data using more than one method. The types of data captured by the CPRs ranged from only a single type, such as endoscopy reports, to a comprehensive set of patient data. The scope of the CPRs also varied widely, ranging from a single clinic to a large health-maintenance organization.
Table 3.
Summary of CPRs
Study | Name of System | Time in Usea | Data-entry Mechanism | Scope of System | Data Content |
---|---|---|---|---|---|
Jelovsek/Hammond, 197816 | Computerized Obstetric Medical Record (COMR) | 7 years | Encounter forms and patient questionnaires | University hospital obstetric clinics | Demographics, history, physical, laboratory, medications, diagnoses |
Fortinsky/Gutman, 198117 | —b | 2 years | Encounter forms | University hospital family practice clinic | Demographics, diagnoses, diagnostic codes |
Jones/Hedley, 198618 | — | 3 years | Encounter forms | University hospital diabetes clinic | Demographics, history, physical, laboratory, complicationsc |
Maresh et al., 198619 | — | 2 years | Encounter forms | University hospital dept. of obstetrics | Demographics, history, physical, discharge summaries, laboratory, diagnostic codes |
Dambro/Weiss, 198820 | COSTAR | 3 weeks | Encounter forms | University hospital family practice clinics | Demographics, history, physical, laboratory, medications, diagnoses, treatments |
Block/Brennan, 198921 | MediData Medical Information System | 4 years | Physician dictation | Urban hospital family practice clinics | Demographics, history, diagnoses, diagnostic codes |
Gouveia-Oliveira et al., 199122 | SISCOPE | 1 year | Direct physician entry | University hospital dept. of gastroenterology | Endoscopy reports |
Johnson et al., 19912 | AAH Meditel | — | Direct physician entry | Numerous general practicesd | Medications, diagnoses |
Kuhn et al., 199123 | — | 6 years | Physician dictation | University hospital dept. of gastroenterology | Endoscopy and abdominal ultrasound reports |
Barrie/Marsh, 199224 | Manchester Orthopaedic Database | 1.5 years | Physician dictation | Community hospital dept. of orthopedics | Demographics and orthopedic diagnoses, procedures, and complications of procedures |
Kuhn et al., 199225 | — | 21 weeks | Direct physician entry | University hospital dept. of gastroenterology | Endoscopy and abdominal ultrasound reports |
Edsall et al., 199326 | ARKIVE | — | Automatic capture of vital signs and direct physician entry | University hospital dept. of anesthesiology | All components of anesthetic record |
Payne et al., 199327 | — | 8 months | Transcription of paper-based records | Large HMO | Demographics, immunization records |
Ricketts et al., 199328 | Manchester Orthopaedic Database | 1 year | Physician dictation | Community hospital dept. of orthopedics | Demographics and orthopedic diagnoses, procedures, and complications of procedures |
Barlow et al., 199428 | Basingstoke Orthopaedic Database | 2.5 years | Direct physician entry | Community hospital dept. of orthopedics | Demographics and orthopedic diagnoses, procedures, and complications of procedures |
Hohnloser et al., 199429 | — | 1.5 years | Entry of data by laboratory staff | University hospital dept. of pathology | Hematology biopsy reports, diagnostic codes |
Wilton/Pennisi, 19943 | — | — | Transcription of paper-based records | University hospital pediatric clinics | Demographics, immunization records |
Pringle et al., 199530 | EMIS | — | — | Four general practicese | Demographics, history, physical, laboratory, medications, diagnoses, treatments, referrals |
Yarnall et al., 199531 | The Medical Record (TMR) | 10 years | Encounter forms | University hospital family practice clinic | Demographics, laboratory, medications, diagnoses, diagnostic codes, x-ray reports |
Wagner/Hogan, 199632 | BGC EMR | 1.5 years | Encounter forms and direct clinician entry | University hospital geriatrics clinic | Demographics, history, physical, laboratory, medications, diagnoses, referrals |
Refers to how long the system had been in use when the data in question was entered and is approximate.
Empty blocks signify that the information was not available from the article.
As pertain to diabetes and its complications.
A total of 433 general practices linked to a mainframe were included. The system was commercial, and it is not clear whether the practices share data.
The authors selected practices that had a history of high rates of recording patient data.
Because of the variability in methods and CPRs, and because of the small number of these studies, we could not detect relationships between data accuracy and CPR characteristics across studies. However, several individual studies provided results that are informative about the relationship between CPR characteristics and data accuracy. Three studies suggest that broadening the scope of the CPR may improve completeness of CPR data.3,27,32 One study measured the accuracy of data entered directly by clinicians versus data-entry personnel in the same version of the CPR at the same time. In this study, no significant difference in the accuracy of data entered directly by clinicians versus data entered from encounter forms by licensed nurses' aides was found.32 However, this study lacked statistical power to demonstrate a small difference. Kuhn and associates25 measured data accuracy in a new version of their CPR that used direct physician entry of data into structured, electronic forms, and they compared it to the accuracy of an earlier version of the CPR based on physicians' dictation of unstructured reports. They showed a significant improvement in accuracy with direct physician entry; however, this result is confounded by a potential checklist effect of the structured form introduced in the new version of the CPR and by the use of a historical control.
Several studies investigated interventions designed to improve data accuracy by measuring accuracy before and after the intervention. Fortinsky and Gutman17 found that structured encounter forms significantly increased completeness of diagnosis recording relative to unstructured forms. Yarnall and associates31 found that prompting physicians with previously recorded diagnostic codes on an encounter form improved correctness and completeness of diagnosis coding over the previous system (where physicians rewrote diagnoses on the billing sheet at every patient visit). Dambro and Weiss20 found that periodic monitoring of data accuracy and feedback to physicians and transcriptionists improved correctness of data entry. All three of these studies used historical controls, and thus the improvements in accuracy may have been due, at least in part, to other factors.
No study compared directly the accuracy of CPR data with the accuracy of data in a paper-based record. Because many of the 26 studies in CPRs used a paper-based record as the gold standard against which CPR data were compared, we cannot even compare data accuracy in CPRs with reported rates of accuracy in paper-based records. Such a comparison is logical and free of bias only when both systems, paper based and computer based, are measured against the same gold standard. We consider this issue further in the discussion.
One study demonstrated that factors external to the CPR may have a powerful influence on data accuracy. Ricketts and associates28 compared data accuracy at two similar hospitals using the same CPR system and found large differences in accuracy. They attributed these differences to the presence of a systems coordinator at one hospital, whose role, among others, was to improve data accuracy by producing monthly reports for audit meetings and reviewing incorrect usage of coding terms.
Discussion
Quality of the Literature
The quantity and quality of the literature on data accuracy in CPRs did not match our expectations, given the importance of accurate data for its various uses. Of particular concern are those uses such as clinical research and health-system management, where decisions made based on inaccurate CPR data can potentially affect large numbers of patients. Compared with the literature on data accuracy in disease registries and clinical trials databases, the English-language literature on data accuracy in CPRs is not extensive, comprising only 26 studies conducted in 19 distinct CPRs. The quality of the 26 studies was not uniformly high. Only seven studies achieved two-thirds or more of the total methodological and description score possible; one study achieved three-fourths or more of the total possible. The variability in the quality and methods of these studies rendered formal statistical methods of meta-analysis and correlation of results unfeasible, and made even a qualitative synthesis of the literature on data accuracy in CPRs difficult.
We do not know why so few studies of high quality have been reported in the literature. The two most likely possibilities are (1) that CPRs are still relatively new (or nonexistent) in academic settings, and (2) that the problem of data accuracy in CPRs has not received much attention in the field of Medical Informatics, which typically has had a primary interest in the design and development of CPRs. Both of these situations are changing, and as CPR data are used more often for multiple purposes, especially for clinical research and as a data source for disease registries, we expect that the Medical Informatics research community will become more aware of the problem and will apply its expertise to rigorous studies of data accuracy in CPRs.
Accuracy of Data in CPRs
Two factors made analysis and interpretation of the results about rates of accuracy across studies nearly impossible. First, the variability in methods and quality of these studies (as mentioned previously) rendered meta-analysis unfeasible. Second, although researchers reported a rate of completeness for all the types of data that they studied, they reported rates of correctness for significantly fewer than one-half of types of data. As discussed in the introduction, it is difficult to understand the level of data accuracy in a system based on only one measure of accuracy—very high correctness may be achieved at the expense of leaving many observations unrecorded (i.e., a low rate of completeness). The failure to report both measures thus results in an inadequate depiction of data accuracy. All these factors taken together make it difficult to assess whether the accuracy of data in CPRs is poor, fair, good, or excellent.
Despite these limitations, we can form an impression of data accuracy in CPRs by analysis of the few studies of high quality. Our impression, based on examination of rates of accuracy in Table 2 from the 7 studies scoring 12 points or higher, is that data accuracy in CPRs is fair to good. With the exception of a few data types such as specific diagnoses (e.g., anemia in children) and occupational history, the majority of rates of correctness and completeness from these studies are 80% and higher for the types of data studied.
A key limitation of the body of literature that we reviewed is that results about rates of data accuracy in CPRs may not be representative of what we would find in modern CPRs, which tend to be more extensive in scope and data content. The 19 CPRs that these studies evaluated consist largely of single-clinic or single-hospital systems that serve specialized purposes (e.g., endoscopy reports). Additionally, these studies did not include many prominent research CPRs. Of the ten CPRs considered to be today's state-of-the-art systems by the 1991 IOM report,15 only two appear in our review (COSTAR and The Medical Record).20,31 We found no published studies of data accuracy about other CPRs cited by the IOM such as the Health Evaluation through Logical Processing (HELP) system at LDS Hospital, the CPRs at Beth Israel and Brigham and Women's Hospital, the THERESA system at Grady Memorial Hospital, and the Department of Defense's Composite Health Care System. Studies conducted in mature, comprehensive implementations of the CPR such as these CPRs would be informative. Such studies may still underestimate the potential for CPRs to provide users with accurate data. The IOM report stresses that even these model systems are not complete CPRs.15 The IOM envisions future systems with seamless integration of data across hospitals, clinics, pharmacies, nursing homes, and so on, with a record of the patient's health status and functional level; problem lists; documentation of the rationale for clinical decisions; links to local and remote knowledge, literature, and administrative systems; and provision of reminders and decision analysis tools to clinicians.
Causes of Inaccurate Data
Too few causes of error in CPR data have been identified and studied. Researchers have directed most of their attention to data entry; four studies measured the amount of error resulting from the data entry process.3,20,27,32 Contrary to conventional wisdom, three of these four studies suggest that data entry is a relatively minor cause of error and that other factors such as the scope of the CPR play a larger role.3,27,32 Only one study developed a classification system for categorizing inaccuracy in a CPR and discussed how several classes of errors might be addressed by CPR improvements.32 This method identified CPR improvements that addressed several categories of causes of error and, if implemented, had the potential to improve data accuracy.
Investigating the causes of data error in a CPR is a prerequisite for the reduction of error. Because the process of data capture is a complex system (Fig. 1), techniques from the field of continuous quality improvement (CQI) seem ideal for studying causes of error in CPR data and correcting those causes by improvements in the mechanisms of data capture in the CPR. From Figure 1, it is apparent that errors may be introduced at multiple points in the process of data capture; thus, it may be insufficient to implement only a single intervention to improve accuracy. Moreover, even after successful interventions, accuracy may not be maintained over time. Medical processes are complex and ever changing, and turnover of personnel may result in unexpected changes in procedures that result in data error. These observations suggest the need for a cycle of regular monitoring, analysis of errors, and interventions designed to improve accuracy, analogous to techniques in CQI.
Although this level of monitoring, to our knowledge, is not routine practice in CPRs, it is typical in clinical research databases because of the importance of accurate data.33,34,35,36 For example, by studying the accuracy of data entry and the causes of inaccuracies, Horbar and associates were able to implement additional routines for automatic logic, range, and consistency checking that reduced inaccuracies caused by data entry by threefold.33 Although some of the procedures used in research databases, such as duplicate data entry, may not be practical in CPRs, the process of continuously monitoring accuracy and taking steps to improve it seem relevant, especially if CPR data are to be used for research.
Influence of CPR Characteristics on Accuracy
Several studies reported findings that associate CPR characteristics with data accuracy. As mentioned previously, three studies suggested that improvement of the scope of the CPR to include other clinics, departments, and even hospitals may improve the completeness of data capture. Two studies present conflicting data about whether direct clinician entry improves data accuracy. Wagner and Hogan32 found no effect on accuracy, but their study lacked the statistical power to detect a small difference. Kuhn et al.25 found an improvement in data accuracy with direct physician entry over physician dictation of free text reports, but the results were confounded by the checklist effect and the use of a historical control.
Intuitively, direct physician entry should improve the accuracy of CPR data. The IOM, in their report on the CPR, advocate direct data entry by clinicians at the point of clinical care as a mechanism to reduce errors.15 However, the literature we review here does not provide convincing evidence that direct physician entry of data is warranted solely for the purpose of reducing inaccuracy. (There are other reasons for implementing direct clinician entry, such as providing real-time decision support during order entry.) Confirmation of the hypothesis that direct clinician entry improves data accuracy must await future studies.
Improvement of Data Accuracy
Seven of the 26 studies investigated interventions that may improve data accuracy. Besides factors already discussed (expanding the scope of the CPR and direct clinician entry), other interventions included structured data capture,17,25,31 automatic capture of data from electronic patient monitoring systems,26 monitoring of data accuracy with feedback to personnel involved in data entry,20,28 and providing clinicians with access to a CPR when outside of the clinic or hospital.32 Whether these interventions will improve data accuracy consistently in a variety of CPRs, however, requires further study.
Accuracy of CPRs versus Paper-based Patient Records
No study made a direct comparison of data accuracy in a CPR with data accuracy in a paper-based record. In theory, CPRs should attain higher levels of data accuracy than paper-based records. CPRs can employ validity checks during data entry; allow continual improvement of data by editing rather than rewriting (or redictating); and use standards for transmission of medical data to consolidate observations from disparate locations into a single logical record.
The literature on data accuracy in traditional paper-based records is summarized in the Institute of Medicine's report on the CPR.15 These studies typically investigated the accuracy of diagnoses, and used, as a gold standard, a record of the actual clinician-patient encounter (e.g., a tape-recording of the encounter,9 a transcript of a tape-recording of the encounter,8 and a consensus of observers who viewed the encounter through a one-way mirror13). In contrast, the gold standards used in studies of the accuracy of diagnoses in CPRs were largely paper-based records. Because of this major methodological difference, we are unable to compare data accuracy in CPRs with that in paper-based records. Therefore, we cannot determine whether the majority of CPRs contain data that are less, more, or as accurate as those in paper records. Further studies would be necessary to test the hypothesis that data accuracy in CPRs will surpass that of paper-based records; however, such studies are likely to be neither the driving force behind the move toward CPRs (market and other forces are already leading to widespread use of the CPR), nor completed before these other forces lead to nearly universal adoption of CPRs.
Methodological Recommendations
A standard method for future studies of data accuracy in CPRs is needed for two reasons. First, by using more rigorous methods, researchers can improve the quality of the literature on data accuracy in CPRs so that the questions posed, and largely unanswered, by this review may be resolved. Second, increased uniformity of methods should greatly assist with future syntheses of the literature and might allow researchers to apply statistical methods of meta-analysis.
We base the following recommendations on our reading of the literature on data accuracy in clinical trials databases and registries and on our perception of the shortcomings of these 20 articles about data accuracy in CPRs. Researchers should (1) report numerical measures of both correctness and completeness, (2) use an unbiased sampling technique to select patient records for inclusion in the study, (3) select a gold standard with the intention of approximating the true state of the patient as closely as possible, and (4) blind the members of the research team who are responsible for the determination of the gold standard to both the purpose of the study and the CPR data when appropriate. Ideally, studies should provide a thorough description of the CPR, including its name, hardware components, and software versions (especially if the CPR is commercially or otherwise available for implementation at other sites), what types of data it contains, how long it has been in place, its scope, and a description of its methods for data capture. These characteristics of the CPR may potentially influence data accuracy and seem germane to the interpretation of results about data accuracy. Whether CPR data are the official patient record is likely to influence the care and accuracy with which clinicians record data. Accessibility of the CPR should be described—whether data can be entered remotely, from which locations, at what times, and how often the system is “down” may influence accuracy because these factors can limit opportunities to record data. The gold standard should be determined with the goal of approximating the true state of the patient (or world).
Although it is not a methodological recommendation, we propose adding a MeSH term for the concept of data accuracy to MEDLINE. Data accuracy has become an important concept with the widespread use of computer-based systems for clinical and epidemiological research.11 CPR data collected at the point of care are being used increasingly for these purposes as well as being used for making administrative and policy decisions. As a result of these developments, data accuracy will become an even more important topic. The creation of a MeSH term for data accuracy and the indexing of future studies under this MeSH term would facilitate future research and the dissemination of its results.
Conclusions
Data collected by CPRs are used throughout the health care system. The accuracy of this data is critical to the optimal outcome of many health care activities. This review shows that our understanding of data accuracy in CPRs is not commensurate with its importance. It is imperative that we both measure and characterize the accuracy of data in CPRs and investigate ways to improve it. Moreover, as data in CPRs are used increasingly for research, the methods of continuous monitoring and improvement of accuracy used in research databases should be applied to CPRs. Achievement of these objectives will be facilitated by the use of rigorous, more uniform methods to measure accuracy and the incorporation of a MeSH term for data accuracy in MEDLINE to improve dissemination of information about data accuracy in CPRs.
Acknowledgments
Jeffrey C. Whittle, MD, Department of Medicine, Oakland Veterans Affairs Medical Center; and Charles P. Friedman, Director, Center for Biomedical Informatics, University of Pittsburgh Medical Center provided invaluable discussion.
This research was partially supported by grant LM07059-10 from the National Library of Medicine.
Footnotes
Many authors use “accuracy” to refer only to the measure of correctness. We use the term to refer not to just the measure of correctness, but more generally to encompass both measures—correctness and completeness. Other synonyms for correctness in the data accuracy literature include reliability and validity.
Specifically, the MEDLINE search logic was ((accuracy.tw. OR accurate.tw. OR... OR quality.tw.) AND (exp INFORMATION SYSTEMS)).
References
- 1.Leape LL, Bates DW, Cullen DJ, et al. Systems analysis of adverse drug events. ADE Prevention Study Group. JAMA. 1995;274: 35-43. [PubMed] [Google Scholar]
- 2.Johnson N, Mant D, Jones L, Randall T. Use of computerised general practice data for population surveillance: comparative study of influenza data. BMJ. 1991;302: 763-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wilton R, Pennisi AJ. Evaluating the accuracy of transcribed computer-stored immunization data. Pediatrics. 1994;94: 902-6. [PubMed] [Google Scholar]
- 4.Hogan WR, Wagner MM. Using belief networks to enhance sharing of medical knowledge between sites with variations in data accuracy. Proc. of AMIA Annu Fall Symp. Philadelphia: Hanley & Belfus, 1995, 218-22. [PMC free article] [PubMed]
- 5.Komaroff AL. The variability and inaccuracy of medical data. Proc IEEE. 1979;67: 1196-1207. [Google Scholar]
- 6.Koran LM. The reliability of clinical methods, data and judgments (first of two parts). N Engl J Med. 1975;293: 642-6. [DOI] [PubMed] [Google Scholar]
- 7.Koran LM. The reliability of clinical methods, data and judgments (second of two parts). N Engl J Med. 1975;293: 695-701. [DOI] [PubMed] [Google Scholar]
- 8.Romm FJ, Putnam SM. The validity of the medical record. Med Care. 1981;19: 310-5. [DOI] [PubMed] [Google Scholar]
- 9.Zuckerman ZE, Starfield B, Hochreiter C, Kovasznay B. Validating the content of pediatric outpatient medical records by means of tape-recording doctor-patient encounters. Pediatrics. 1975;56: 407-11. [PubMed] [Google Scholar]
- 10.Parkin DM, Muir CS. Cancer incidence in five continents: comparability and quality of data. IARC Scientific Publications. 1992;120: 45-173. [PubMed] [Google Scholar]
- 11.Goldberg J, Gelfand HM, Levy PS. Registry evaluation methods: a review and case study. Epidemiol Rev. 1980;2: 210-20. [DOI] [PubMed] [Google Scholar]
- 12.Institute of Medicine: Reliability of National Hospital Discharge Survey Data. Washington DC: National Academy of Sciences, 1980.
- 13.Bentsen BG. The accuracy of recording patient problems in family practice. J Med Educ. 1976;51: 311-6. [DOI] [PubMed] [Google Scholar]
- 14.Wiederhold G, Perrault LE. Clinical research systems. In: Shortliffe EH, Perreault LE (eds). Medical Informatics: Computer Applications in Health Care. Reading, MA: Addison-Wesley, 1990: 503-34.
- 15.Institute of Medicine: The computer-based patient record: an essential technology for health care. Dick RS, Steen EB (eds). Washington DC: National Academy Press, 1991. [PubMed]
- 16.Jelovsek F, Hammond W. Formal error rate in a computerized obstetric medical record. Methods Inf Med. 1978;17: 151-7. [PubMed] [Google Scholar]
- 17.Fortinsky RH, Gutman JD. A two-phase study of the reliability of computerized morbidity data. J Fam Pract. 1981;13: 229-35. [PubMed] [Google Scholar]
- 18.Jones R, Hedley A. A computer in the diabetic clinic: completeness of data in a clinical information system for diabetes. Practical Diabetes. 1986;3: 295-6. [Google Scholar]
- 19.Maresh M, Dawson AM, Beard RW. Assessment of an online computerized perinatal data collection and information system. Br J Obstet Gynaecol. 1986;93: 1239-45. [DOI] [PubMed] [Google Scholar]
- 20.Dambro MR, Weiss BD. Assessing the quality of data entry in a computerized medical records system. J Med Syst. 1988;12: 181-7. [DOI] [PubMed] [Google Scholar]
- 21.Block B, Brennan JA. Reliability of morbidity data in a computerized medical record system. In: Hammond WE (ed). Proceedings of the AAMSI Congress 89. Eighth Annual National Congress, 1989: 21-30.
- 22.Gouveia-Oliveira A, Raposo VD, Salgado NC, Almeida I, Nobre-Leitao C, de Melo FG. Longitudinal comparative study on the influence of computers on reporting of clinical data. Endoscopy. 1991;23: 334-7. [DOI] [PubMed] [Google Scholar]
- 23.Kuhn K, Swobodnik W, Johannes RS, et al. The quality of gastroenterological reports based on free text dictation: an evaluation in endoscopy and ultrasonography. Endoscopy. 1991;23: 262-4. [DOI] [PubMed] [Google Scholar]
- 24.Barrie JL, Marsh DR. Quality of data in the Manchester Orthopaedic Database. Br Med J. 1992;304: 159-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kuhn K, Gaus W, Wechsler JG, et al. Structured reporting of medical findings: evaluation of a system in gastroenterology. Methods Inf Med. 1992;31: 268-74. [PubMed] [Google Scholar]
- 26.Edsall DW, Deshane P, Giles C, Dick D, Sloan B, Farrow J. Computerized patient anesthesia records: less time and better quality than manually produced anesthesia records. J Clin Anesth. 1993;5: 275-83. [DOI] [PubMed] [Google Scholar]
- 27.Payne T, Kanvik S, Seward R, et al. Development and validation of an immunization tracking system in a large health maintenance organization. Am J Prev Med. 1993;9: 96-100. [PubMed] [Google Scholar]
- 28.Ricketts D, Newey M, Patterson M, Hitchin D, Fowler S. Markers of data quality in computer audit: the Manchester Orthopaedic Database. Ann R Coll Surg Engl. 1993;75: 393-6. [PMC free article] [PubMed] [Google Scholar]
- 29.Hohnloser JH, Fischer MR, Konig A, Emmerich B. Data quality in computerized patient records: analysis of a haematology biopsy report database. Int J Clin Monit Comput. 1994;11: 233-40. [DOI] [PubMed] [Google Scholar]
- 30.Pringle M, Ward P, Chilvers C. Assessment of the completeness and accuracy of computer medical records in four practices committed to recording data on computer. Br J Gen Pract. 1995;45: 537-41. [PMC free article] [PubMed] [Google Scholar]
- 31.Yarnall KS, Michener JL, Broadhead WE, Hammond WE, Tse CK. Computer-prompted diagnostic codes. J Fam Pract. 1995;40: 257-62. [PubMed] [Google Scholar]
- 32.Wagner MM, Hogan WR. The accuracy of medication data in an outpatient electronic medical record. J Am Med Inform Assoc. 1996;3: 234-44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Horbar JD, Leahy KA. An assessment of data quality in the Vermont-Oxford Trials Network database. Controlled Clin Trials. 1995;16: 51-61. [DOI] [PubMed] [Google Scholar]
- 34.Neaton JD, Duchene AG, Svendsen KH, Wentworth D. An examination of the efficiency of some quality assurance methods commonly employed in clinical trials. Stat Med. 1990;9: 115-23; discussion 124. [DOI] [PubMed] [Google Scholar]
- 35.Pollock BH. Quality assurance for interventions in clinical trials: multicenter data monitoring, data management, and analysis. Cancer. 1994;74: 2647-52. [DOI] [PubMed] [Google Scholar]
- 36.Vantongelen K, Rotmensz N, van der Schueren E. Quality control of validity of data collected in clinical trials. EORTC Study Group on Data Management (SGDM). European Journal of Cancer & Clinical Oncology. 1989;25: 1241-7. [DOI] [PubMed] [Google Scholar]