From simply inaccurate to complex and inaccurate: complexity in standards-based quality measures

David A Dorr; Aaron M Cohen; Marsha Pierre-Jacques Williams; John Hurdle

. 2011 Oct 22;2011:331–338.

From simply inaccurate to complex and inaccurate: complexity in standards-based quality measures

David A Dorr ¹, Aaron M Cohen ¹, Marsha Pierre-Jacques Williams ¹, John Hurdle ²

PMCID: PMC3243137 PMID: 22195085

Abstract

Quality measurement has been slow to make a major impact in health care. Initial measures were too simple to affect outcomes of importance. Incentive programs such as Meaningful Use encourage better measures, but in process may become more complex. We evaluated the measures selected for Meaningful Use in two ways: we counted unique concept identifiers, taxonomies, and aggregated concepts as measures of complexity; and we surveyed informatics professionals to assess difficulty. There were 20,316 unique concept identifiers, 35 taxonomies, and 317 aggregated concepts across the 45 measures. Half the respondents reported measures at least moderately difficult. The number of identifiers was associated with fewer implementations (r=−.37); rating-of-difficulty was associated with more taxonomies (r=.24). The impact on accuracy may be substantial when moving to measures intended to be more relevant to clinical outcomes but requiring the use of more taxonomies, unused structured concept identifiers, or concepts only in free text fields.

Introduction

Rising health care costs in the United States – both current and projected – and heightened concern about the quality of health care in the U.S. has led to an increased focus on the value of health care. Americans now spend roughly twice the average of other developed nations on health care, yet our health outcomes are, at best, average.¹ Despite recent reforms, U.S. health care is predominantly funded through a fee-for-service system, where each unit of care is reimbursed separately. As Elliott Fisher, head of the Dartmouth University Atlas of Health, has publicly (and controversially) stated, “They [providers and hospitals] are rewarded for more services, not better services. They are rewarded for more care, not better care.” The combination of inverted incentives, mediocre outcomes, and increasing spending has led to a need to re-establish the value of health care services, and to the need to redesign the system to match these values.² One way to establish value is to measure the quality, rather than the quantity, of services provided. Multiple efforts have focused on changing the emphasis to performance, rather than services, but these have achieved mixed results.³^–⁵

Quality measurement itself suffers from a number of problems. First, initial development of quality measures took place when little data was available; definitions were limited mostly to aspects of services billed, and the underlying data were found to be inaccurate in many cases.⁶ Second, these measures, as defined, were not supported by evidence to be directly related to the actual outcomes of care – such as health improvement, quality of life, or decreased mortality,⁷^,⁸ and so provided little additional benefit. As adoption of electronic health record systems (EHRs) increases, more clinical data is now available for the evaluation of care but has been underutilized in previous measures. Third, those new measures that are more clinical and timely have been shown to result in improvements in both measurement of quality as well as in quality itself.³^,⁹^–¹¹ However, these measures often require new data collection workflows or permitted variable implementation schemes, leading to variation in results, continued inaccuracy, and lack of comparability.¹² Even with advances in measurement technique, consistent, comparable, and accurate measurement are still difficult for many sites and thus have a limited impact on outcomes outside highly advanced centers who have built their own EHR systems.¹³

In 2009, the Health Information Technology for Economic and Clinical Health (HITECH) Act was enacted as part of the American Recovery and Reinvestment Act of 2009. This act brought a focus on the ability of EHRs to both measure and improve quality – with over $20 billion dollars in incentives for their use. This came, however, with the significant caveat that the EHRs must be used in ‘meaningful ways.’ In the last year, the Office of the National Coordinator for Health Information Technology (ONC) has brought together a set of multidisciplinary panels to define ‘meaningful use’ and how to encourage it. One aspect they endorsed is the careful, standardized measurement of quality. Starting with a list from the well-respected National Quality Forum (NQF), the ONC identified 45 high-priority quality measures ranging from measurement of blood pressure at office visits to establishing follow-up plans for patients with high (or low) body mass index. A subset of the measures - three core and three alternate core - are identified as “must complete” measures, while implementers must select an additional three from the remaining list. Evaluating six quality measures while using the EHR for a set of standard tasks - Computerized Provider Order Entry, using a structured Problem List, recording Demographics in a structured fashion, amongst others - will earn an Eligible Provider up to $44,000 in incentive payments to offset the implementation of the EHR.

Unlike older measures, the high-priority quality measures selected are all defined in terms of standardized concepts, including a reference terminology and defined relationships between concepts. These measures are relatively new and better specified (including machine-readable versions) than older measures. With the incentive program driving people to adapt how they measure quality, we wanted to answer two questions: 1) what were the relative complexity and difficulty of implementation of the new measures?; and 2) how did the complexity and difficulty relate to each other? For difficulty, we surveyed a group of expert implementers; for complexity, we assessed the number of concepts, taxonomies, and relationships for each from the measure definitions themselves.

Methods

Forty-five quality measures were selected by the Office of the National Coordinator for Health Information Technology; each of these measures was validated initially by the National Quality Forum (NQF) and each had an electronic definition created in addition to a human readable form. We obtained the definitions from the CMS Web site (http://www.cms.gov/QualityMeasures/03_ElectronicSpecifications.asp). Each definition consists of human-readable and machine-readable forms. The human-readable form describes the measure, the rationale and background, and outlines the logic of the measure. The machine-readable form consists of a spreadsheet with all concepts related to the measures. Thus, if the human-readable form of the measure specifies that the denominator should be “all people seen by a provider who are over 18 with persistent or severe asthma” and the numerator specifies that “these people should have received a prescription for a long-acting medication to treat asthma,” the machine-readable version would contain all the possible concepts and codes from all the possible taxonomies that relate to the measure and how to combine these into aggregate concepts for the numerator and denominator. Under the incentive program, the quality measures need to be implemented directly through a certified tool – either an EHR or a data warehouse. Therefore, we took the approach that what the system would process, the machine-readable component, would define the measure’s objective complexity – how many components and of what type would need to be implemented – while the human experience of trying to get the right data from the right sources (including workflow redesign and measure context) would define the more subjective difficulty of implementation.

Definition of complexity

We defined three complexity measures: 1) the unique number of taxonomies, 2) aggregated concepts, and 3) individual codes (or concept identifiers) that make up the aggregate concepts for a measure. The rationale for including these three scores was arrived at by expert opinion of the authors. For instance, an increasing number of taxonomies may require more maintenance burden or create difficulties in concept linkage, data warehouse design, and report writing, while an increasing number of individual codes may require more complex workflows or suffer from individual variation at the institution and/or EHR user level. The aggregate concepts represent the number of final concepts used in the relationships of the quality measure.

Definition of difficulty

We also surveyed leading measure implementers about the difficulty of implementing these measures. We designed a survey that presented each measure. We tested the survey in a small group of implementers from an AMIA informatics working group, revised it, and then surveyed a broader group. The broader group of implementers was selected in a purposive sample based on experience in implementation of quality metrics. We identified these experts from several sources, including using an email list of quality measure implementation experts from regional extension centers (specific centers funded to aid clinicians in implementing meaningful use measures), from health systems who had committed to be early adopters of the meaningful use quality measures, and by recommendation from other experts. The survey asked them to assess the difficulty of implementation of each measure on a Likert scale (Very easy, Easy, Moderately Difficult, Difficult, and Very Difficult). They were also asked whether they had implemented the measure and, if difficult, where the source of the difficulty lay. For source of the difficulty, the initial working group helped define a set of common causes: 1) data is in multiple fields; 2) data is in the narrative note; 3) data is in a scanned document; 4) data is in other, inaccessible systems; and 5) data is not currently entered anywhere. Comments about source and difficulty were also elicited.

Analysis

Analysis of the complexity and difficulty consisted of 2 phases. First, simple descriptive statistics were created for each measure and question; for the difficulty rating, we used grouped scores of moderate or greater difficulty to create a % of moderate or greater difficulty per measure. We created sums and averages per measure for complexity with ranges as a measure of spread. We then calculated Pearson’s correlation measures to see how complexity varied with difficulty assessments on individual measures, and as a validation test for how individual measures of complexity and difficulty changed together.

Case study

The authors selected one measure that was rated both highly difficult and highly complex to evaluate against a set of patients from a local primary care clinic. The clinic is an academic medical center clinic with approximately 20 eligible providers serving 12,000 patients, about 50% of whom have Medicare and/or Medicaid insurance. The clinic practitioners use a fully functional EHR in both their ambulatory work as well as in the academic hospital. The purpose of the case study was to highlight the potential gaps and their resolution by exploring the results of a business-as-usual implementation for the challenging measure. Business-as-usual was derived from the certified EHR’s instructions and from the working group implementing the measures at the case study institution. The primary principle used was to assume the implementers would not be able to perform secondary chart reviews or iterative redesign, but could map current or slightly modified workflow into the measure’s data requirement. The process consisted of two steps. First an initial implementation and quality measure performance score was generated, then the measure was divided into aggregate concepts and each set of concept identifiers from each taxonomy that related to that concept was compared to see which individual concepts, taxonomies, or individual concept identifiers were most responsible for the performance score.

Results

A synopsis of the measures is shown in Table 1. The core measures are in green (medium gray) while the alternative core measures are in yellow (light gray). A representative set of high-complexity and/or high-difficulty as well as lower-difficulty and lower-complexity measures are displayed for reference. The full results of this analysis are available on request from the first author.

Table 1.

Difficulty and complexity results.

Name of Measure	Unique concept identifiers	Aggregate concepts	Number of taxonomies	% mod. or > difficulty (N=17)	% who had implemented (N=17)
ALL 45 measures	20316	317	35	52.40%	50%
Low Back Pain: Use of Imaging Studies	6467	13	7	69.2%	16.7%
Controlling High Blood Pressure	1815	10	9	38.5%	50.0%
Weight Assessment and Counseling for Children and Adolescents	1676	12	8	75.0%	16.7%
Adult Weight Screening and Follow-Up	1297	14	12	80.0%	16.7%
Childhood immunization Status	1124	38	12	56.3%	50.0%
Prenatal Care: Anti-D Immune Globulin	899	16	14	46.2%	16.7%
Heart Failure (HF): Beta-Blocker Therapy for Left Ventricular Systolic Dysfunction (LVSD)	705	21	8	69.2%	50.0%
Anti-depressant medication management	562	20	8	76.9%	33.3%
Initiation and Engagement of Alcohol and Other Drug Dependence Treatment	347	7	12	100.0%	16.7%
Preventive Care and Screening: Influenza Immunization for Patients ≥ 50 Years Old	141	18	8	26.7%	83.3%
Preventive Care and Screening Measure Pair: a. Tobacco Use Assessment Preventive Care and Screening Measure pair: b. Tobacco Cessation Intervention	95 97	13 14	4 6	30.8% 38.5%	50% 83.3%
Hypertension: Blood Pressure Measurement	241	8	8	26.7%	50.0%

Open in a new tab

Green : Core measures; yellow: alternate core measures

Complexity

The complexity of the 45 measures, as defined by our specification defined above, are also summarized in Table 1. In all, the 45 measures had 20,316 unique concept identifiers, 317 aggregate concepts, and 35 different taxonomies. A single measure, low back pain, has 6,467 concept identifiers from 7 taxonomies that aggregate to 13 core concepts (e.g., a core concepts are low back pain, imaging studies, visit types, etc., and the individual concept identifiers are the ICD-9-CM, SNOMED-RT, or CPT codes that make up each core concept). Correlations between complexity measures were small to moderate, with # of taxonomies and # of concept identifiers significant (r=0.47, p=.001) and the rest with |r|<.15. This indicates that these may capture different aspects of the measure complexity.

Difficulty survey

Of the 26 experts approached, 17 (65%) from 10 different states responded to the anonymous Web survey. 41% were top-level decision makers in creating strategies and allocating resources to achieve Meaningful Use of their institution EHRs while the other 59% were health informaticians or IT personnel who worked on Meaningful Use assessments and EHR implementation. The results are also summarized in Table 1. On average, half (52.4%) of the respondents rated the measures as at least moderately difficult. All respondents rated Initiation and Engagement of Alcohol and Other Drug Dependence Treatment as difficult, while only one quarter rated blood pressure measurement as difficult. Measure implementation was strongly, but not perfectly, negatively correlated with difficulty (r=−0.68, p<.0001), providing one source of validation for the idea that difficulty is an impediment to implementation. Measure implementers were not significantly different from non-implementers in role.

Source of difficulty

Respondents were also asked to identify one or more sources of the difficulty for the core and alternate core measures. Figure 1 highlights the responses for each category. One third of the time, the respondents felt the measures’ data was “hidden” in the clinical notes or was in multiple coded fields requiring multiple combinations of data. Respondents did not often think that data was in scanned documents (5 measures: 0%; 1 measure, influenza screening: 10%) or in other, inaccessible systems (5%). Sometimes it was thought that the source information for the measure may not be present in the EHR at all. For influenza screening, one commented: “The challenge for influenza immunization is those patients who receive them outside our clinic. Will patient report of receiving an influenza immunization be adequate for this measure?” Concerning one of the measures (CM 2, or NQF 0028, Tobacco Cessation), 30% felt the results were not entered anywhere, but only 10% on average felt that was true for the other measures.

Comparison between difficulty and complexity

Measures of correlation between difficulty and complexity were moderate, with more unique codes correlated with fewer implementations (r=−.37, p=.02) and difficulty related to number of taxonomies (r=.24, p<.10). Figure 2 displays the spread and slopes of the correlation between the survey and difficulty measures. Neither number of unique concepts identifiers nor number of aggregated concepts were significantly related to the perception of difficulty, although more concept identifiers showed a trend towards association with higher difficulty (r=.15).

Figure 2. — Correlation between difficulty and complexity.

Case Study

A single measure, Adult Weight Screening and Follow-Up (NQF 0421), at a single site (a moderate sized Internal Medicine clinic) was chosen as the case study since this measure was rated as both complex and difficult (12 taxonomies, 2nd most; 1297 concept identifiers, 4th most; 80% rated as difficult and only 16.7% had implemented). The measure requires every patient older than 18 to have a weight and height measured yearly, a body mass index (BMI) calculated, and a BMI follow-up plan created and documented if the BMI is abnormal. First, we evaluated if the taxonomies required to perform the measure were in place. Of the 12 taxonomies, 9 were fully implemented within the site’s EHR (e.g., all procedures had CPT codes associated) and 3 were partially implemented (LOINC, HCPCS, SNOMED-CT; e.g., SNOMED was implemented for diagnoses in a hierarchical fashion but had limited association with other data domains).

Then, we calculated the quality measure. Table 2a and 2b highlights the overall score and a subset of individual concept identifiers across the eligible population; in the table, the aggregated concepts are applied step-by-step and the performance recalculated. The complete measure calculation had 10,322 people eligible and 2,876 met the measure for a score of 2,876. In all, 10,792 patients had eligible encounters, 10,789 haad eligible birthdates (3 had coding errors), and 445 had an exclusion code for pregnancy, leaving 10,322. In Table 2a, the denominator requires a birthdate (HL7 coded) and a subset of outpatient encounters (CPT or HCPCS coded); 3 patients did not have a birthdate coded and were excluded. The largest potential denominator error comes from exclusion concepts, the source of largest number of concept identifiers (1,215, mostly pregnancy ICD-9 codes); failing to include this complex set of codes would have decreased performance by 1.2% (27.9 to 26.7%). The numerator (concepts 13 and 14) starts with a measurement of BMI (5,491 completed) and then adds the requirement for abnormal BMIs to have a coded follow-up plan. Of these, 5,491 (53.2%) had a body mass measured, and 2,640 were abnormal BMIs and would require follow-up. However, only 25 (0.95%) of abnormal BMIs had a coded follow-up plan as defined by 31 codes from 5 taxonomies, leaving only 2,876 successes. Thus, performance goes from 53.2% to 27.8%, a 50% decline from a single concept, “BMI follow-up plan”.

Table 2a.

Complete measure and denominator calculation for case study, keeping final numerator

Concept	N denominator	N numerator	Performance Percent	Number of concept identifiers
*Complete measure*	*10,322*	*2876*	*27.9%*	*1297*
Concept 1–3: outpatient encounter (CPT or HCPCS) in valid time (HL7)	10,792 (CPT) + 0 (HCPCS)	2876	26.6%	41 (CPT) + 5 (HCPCS)
AND Concept 4: birthdate (HL7)	10,789		26.7%	1
NOT Exclusion concepts (5–12): Pregnant (ICD-9-CM) or manually excluded (HL7)	10,322		27.9%	1215

Open in a new tab

Table 2b.

Numerator calculation, keeping final denominator

Concept 13: BMI measured(SNOMED or LOINC)	10,322	5491 (SNOMED) + 0 (LOINC)	53.2%	3 (SNOMED) + 1 (LOINC)
Concept 14a: BMI follow up plan (HCPCS)		2851 (0 follow-up plans)	27.6%	5
14b: ICD-9 CM		2854 (3 follow-up plans	27.6%	1
14c: CPT		2876 (22 plans)	27.9%	17
14d: SNOMED		2876 (0 plans)	27.9%	8

Open in a new tab

From the available coded data, it is not known whether this performance drop represents the true performance at the institution, or is due to lack of access to data supporting the relevant BMI follow-up plan concept. In the survey on difficulty, fifty percent of our respondents identified that measure NQF 0421 is difficult because data was in the narrative note, and two comments from the survey (“For the BMI the challenge is the ‘follow up plan.’” “The most difficult part … is the extraction of coded data from non-coded text that indicate ‘follow-up plans’ for obesity.”) both indicate this concept in the measure was the source of the challenge, and that narrative text might be helpful to improve performance. The setting may be particularly important as well; in the Internal Medicine practice in this case study, few pregnant patients are seen. Failing to include the vast number of pregnancy codes used for exclusion, however, may make the performance score much more inaccurate in an Obstetrics/Gynecology or Family Medicine practice.

Discussion

Quality measurement has increasingly become a part of health care incentives and evaluation, and informatics related to quality has advanced significantly in the last decade. Two highly regarded Institute of Medicine reports spurred much of this work. The first, To Err Is Human, focused much-needed attention on the critical quality-of-care issue of patient safety.¹⁴ It is fair to say that that report galvanized Congress and funding agencies, like the Agency for Healthcare Research and Quality (AHRQ) and the Veterans Administration Health Services Research and Development service (HSR&D), to underwrite serious health services research designed to improve patient safety and quality of care. Much of that work has been focused on informatics. Nebeker et al. is representative.¹⁵ The Leapfrog Group, a consortium of public and private partners, was created on the heels of the To Err is Human report and has been working ever since to improve patient safety (http://www.leapfroggroup.org/about_us). The second IOM report, Crossing the Quality Chasm, a new health system for the 21st century, presented a broad and comprehensive plan to improve the quality of healthcare in the United States.¹⁶ In that plan, informatics is a pivotal technology, and is considered indispensable in the effort to make quality care a reality.

Previous quality assessment efforts have required manual chart review, simple concept and logic structures, or patient self-report. An important example of this approach is that taken by the University Health System Consortium (UHC), which is a consortium of 112 major medical centers who routinely exchange quality data in an attempt to raise overall quality standards (https://www.uhc.edu/). The UHC is considered to be an influential quality consortium, but much of the data is collected at their respective institutions manually. These techniques are expensive in terms of time and effort, and each technique may suffer from inaccuracies. For instance, chart documentation is often driven by billing or other non-clinical requirements, and even electronic systems suffer from redundant or ‘cut-and-paste’ text.¹⁷^,¹⁸ Previous researchers¹²^,¹⁹ have found greater accuracy by combining techniques. The newer quality measurement specifications studied here are intended to be more relevant to improving healthcare yet have many more components, requiring more complex implementation techniques in order to accurately measure them. Many experts also found them to be difficult to implement despite the growing experience of measuring quality in health care. Although there was some overlap, complexity and difficulty seemed to be measuring different concepts in this study. In addition to variation in implementation difficulty between measures, there was substantial variation in measure implementation difficulty between subjects. These differences may be due to differences in workflow, EHR implementation, site-specific quality measurement processes or other factors. More study is needed to determine the sources and solutions for these site-specific challenges.

Future research should establish ways to validate each individual implementation or measurement of quality. For instance, taxonomies that are purported to be implemented but yield few or no concept identifiers from the list are likely a sign that the measurement itself is inaccurate. Our study found that substantial amounts of essential information for accurately computing performance measure is available only in the free text fields of the EMR, and not directly accessible by coded field only methods. For accurate assessment of some measures, either the clinical coding workflows will need substantial revision or automated text processing techniques will need to be incorporated into the quality measurement process. Future measures even more targeted to quality care of specific conditions may raise the level of these and additional challenges. Data hidden in clinical notes could be identified using text mining techniques like natural language processing; this could both provide validation for the measurement and add structured codes to improve performance and measurement accuracy. In addition, extracting information from clinical narratives has the advantage of not perturbing clinical workflow. However, it seems likely that not all measures would benefit from this intensive effort, so a prioritization scheme would also be beneficial. It is probably that a combination of different techniques targeted at the measures most likely to derive benefit will be increasingly necessary as more advanced quality measurement becomes part of the healthcare ecosystem.

One question raised by this study is whether measure specifications should include details as to the recording and storage of data. In the example given, it is unclear whether data was not recorded or stored in the proper structured format or if data was not linked to the concept in the proper manner. The new electronic quality measure definitions stop at simple enumeration of the correct concepts that satisfy the measure and therefore a naïve implementer could stop at simply finding some data related to that concept, rather than assuring that the major workflows that would generally produce the data to be linked to the concept were in place. Experience implementers, such as those surveyed here, generally agreed on workflow and data storage gaps, and follow up studies may help identify assessments for those gaps and approaches to reduce them and therefore improve measure accuracy.

Limitations

This study has several limitations. The definitions of complexity and difficulty were created by expert opinion, rather than more formal techniques, and validation to establish their reliability and accuracy still needs to be done. The survey sample size was small and its selection was purposive; however, at the time of the survey, few people nationwide had implemented even a majority of the measures, requiring us to focus on such expertise as was available. Even so, only half, on average, had implemented the new HITECH measures, and the experience of completing implementation may impact the perception of difficulty. However, it is telling that those who had implemented tended to rate the measures as more difficult.

Conclusion

In our evaluation, a great deal of variability is seen in the measured complexity and perceived difficulty of implementation. Complexity and difficulty were mildly correlated but appear to capture different aspects of quality measurement. A case study demonstrated that the difficulty may be increased by having important data present only in narrative notes rather than unstructured fields. This may halve performance scoring. In non-electronic specifications, chart review is often included to search for components hidden in notes; in the fully electronic specification, workflow would have to be changed to ensure that structured codes are filled or, if allowed, advanced text mining techniques used to ensure these codes accurately reflect care provided. This study also revealed that the potential for substantial inaccuracy may be caused by overspecifying coded taxonomies when the taxonomies are not fully connected to the data or the codes may not be regularly entered.

References

1.Schoen C, Osborn R, Doty MM, Squires D, Peugh J, Applebaum S. A survey of primary care physicians in eleven countries, 2009: perspectives on care, costs, and experiences. Health Aff (Millwood) 2009 Nov-Dec;28(6):w1171–83. doi: 10.1377/hlthaff.28.6.w1171. [DOI] [PubMed] [Google Scholar]
2.Nicholas LH, Dimick JB, Iwashyna TJ. Do Hospitals Alter Patient Care Effort Allocations under Pay-for-Performance. Health Serv Res. 2010 Oct 28; doi: 10.1111/j.1475-6773.2010.01192.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Dorr D, Bonner LM, Cohen AN, Shoai RS, Perrin R, Chaney E, et al. Informatics systems to promote improved care for chronic illness: a literature review. J Am Med Inform Assoc. 2007 Mar-Apr;14(2):156–63. doi: 10.1197/jamia.M2255. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Rosenthal MB, Frank RG. What is the empirical basis for paying for quality in health care. Med Care Res Rev. 2006 Apr;63(2):135–57. doi: 10.1177/1077558705285291. [DOI] [PubMed] [Google Scholar]
5.Rosenthal MB, Frank RG, Li Z, Epstein AM. Early experience with pay-for-performance: from concept to practice. Jama. 2005 Oct 12;294(14):1788–93. doi: 10.1001/jama.294.14.1788. [DOI] [PubMed] [Google Scholar]
6.Iezzoni LI. Assessing quality using administrative data. Ann Intern Med. 1997 Oct 15;127(8 Pt 2):666–74. doi: 10.7326/0003-4819-127-8_part_2-199710151-00048. [DOI] [PubMed] [Google Scholar]
7.Werner RM, Asch DA. The unintended consequences of publicly reporting quality information. Jama. 2005 Mar 9;293(10):1239–44. doi: 10.1001/jama.293.10.1239. [DOI] [PubMed] [Google Scholar]
8.Werner RM, Bradlow ET. Relationship between Medicare’s hospital compare performance measures and mortality rates. Jama. 2006 Dec 13;296(22):2694–702. doi: 10.1001/jama.296.22.2694. [DOI] [PubMed] [Google Scholar]
9.Landon BE, Hicks LS, O’Malley AJ, Lieu TA, Keegan T, McNeil BJ, et al. Improving the management of chronic disease at community health centers. N Engl J Med. 2007 Mar 1;356(9):921–34. doi: 10.1056/NEJMsa062860. [DOI] [PubMed] [Google Scholar]
10.Lindenauer PK, Remus D, Roman S, Rothberg MB, Benjamin EM, Ma A, et al. Public reporting and pay for performance in hospital quality improvement. N Engl J Med. 2007 Feb 1;356(5):486–96. doi: 10.1056/NEJMsa064964. [DOI] [PubMed] [Google Scholar]
11.Garg AX, Adhikari NK, McDonald H, Rosas-Arellano MP, Devereaux PJ, Beyene J, et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA. 2005 Mar 9;293(10):1223–38. doi: 10.1001/jama.293.10.1223. [DOI] [PubMed] [Google Scholar]
12.Persell SD, Wright JM, Thompson JA, Kmetik KS, Baker DW. Assessing the validity of national quality measures for coronary artery disease using an electronic health record. Arch Intern Med. 2006 Nov 13;166(20):2272–7. doi: 10.1001/archinte.166.20.2272. [DOI] [PubMed] [Google Scholar]
13.Chaudhry B, Wang J, Wu S, Maglione M, Mojica W, Roth E, et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med. 2006 May 16;144(10):742–52. doi: 10.7326/0003-4819-144-10-200605160-00125. [DOI] [PubMed] [Google Scholar]
14.Kohn L, Corrigan J, Donaldson M, editors. To Err Is Human: Building a Safer Health System. Washington, DC: National Academy Press; 2000. [PubMed] [Google Scholar]
15.Nebeker JR, Hoffman JM, Weir CR, Bennett CL, Hurdle JF. High rates of adverse drug events in a highly computerized hospital. Arch Intern Med. 2005 May 23;165(10):1111–6. doi: 10.1001/archinte.165.10.1111. [DOI] [PubMed] [Google Scholar]
16.Berwick DM. A user’s manual for the IOM’s ‘Quality Chasm’ report. Health Aff (Millwood) 2002 May-Jun;21(3):80–90. doi: 10.1377/hlthaff.21.3.80. [DOI] [PubMed] [Google Scholar]
17.Thielke S, Hammond K, Helbig S. Copying and pasting of examinations within the electronic medical record. Int J Med Inform. 2007 Jun;76(Suppl 1):S122–8. doi: 10.1016/j.ijmedinf.2006.06.004. [DOI] [PubMed] [Google Scholar]
18.Weir CR, Hurdle JF, Felgar MA, Hoffman JM, Roth B, Nebeker JR. Direct text entry in electronic progress notes. An evaluation of input errors. Methods Inf Med. 2003;42(1):61–7. [PubMed] [Google Scholar]
19.McGlynn EA, Asch SM, Adams J, Keesey J, Hicks J, DeCristofaro A, et al. The quality of health care delivered to adults in the United States. N Engl J Med. 2003 Jun 26;348(26):2635–45. doi: 10.1056/NEJMsa022615. [DOI] [PubMed] [Google Scholar]

[b1-0331_amia_2011_proc] 1.Schoen C, Osborn R, Doty MM, Squires D, Peugh J, Applebaum S. A survey of primary care physicians in eleven countries, 2009: perspectives on care, costs, and experiences. Health Aff (Millwood) 2009 Nov-Dec;28(6):w1171–83. doi: 10.1377/hlthaff.28.6.w1171. [DOI] [PubMed] [Google Scholar]

[b2-0331_amia_2011_proc] 2.Nicholas LH, Dimick JB, Iwashyna TJ. Do Hospitals Alter Patient Care Effort Allocations under Pay-for-Performance. Health Serv Res. 2010 Oct 28; doi: 10.1111/j.1475-6773.2010.01192.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b3-0331_amia_2011_proc] 3.Dorr D, Bonner LM, Cohen AN, Shoai RS, Perrin R, Chaney E, et al. Informatics systems to promote improved care for chronic illness: a literature review. J Am Med Inform Assoc. 2007 Mar-Apr;14(2):156–63. doi: 10.1197/jamia.M2255. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b4-0331_amia_2011_proc] 4.Rosenthal MB, Frank RG. What is the empirical basis for paying for quality in health care. Med Care Res Rev. 2006 Apr;63(2):135–57. doi: 10.1177/1077558705285291. [DOI] [PubMed] [Google Scholar]

[b5-0331_amia_2011_proc] 5.Rosenthal MB, Frank RG, Li Z, Epstein AM. Early experience with pay-for-performance: from concept to practice. Jama. 2005 Oct 12;294(14):1788–93. doi: 10.1001/jama.294.14.1788. [DOI] [PubMed] [Google Scholar]

[b6-0331_amia_2011_proc] 6.Iezzoni LI. Assessing quality using administrative data. Ann Intern Med. 1997 Oct 15;127(8 Pt 2):666–74. doi: 10.7326/0003-4819-127-8_part_2-199710151-00048. [DOI] [PubMed] [Google Scholar]

[b7-0331_amia_2011_proc] 7.Werner RM, Asch DA. The unintended consequences of publicly reporting quality information. Jama. 2005 Mar 9;293(10):1239–44. doi: 10.1001/jama.293.10.1239. [DOI] [PubMed] [Google Scholar]

[b8-0331_amia_2011_proc] 8.Werner RM, Bradlow ET. Relationship between Medicare’s hospital compare performance measures and mortality rates. Jama. 2006 Dec 13;296(22):2694–702. doi: 10.1001/jama.296.22.2694. [DOI] [PubMed] [Google Scholar]

[b9-0331_amia_2011_proc] 9.Landon BE, Hicks LS, O’Malley AJ, Lieu TA, Keegan T, McNeil BJ, et al. Improving the management of chronic disease at community health centers. N Engl J Med. 2007 Mar 1;356(9):921–34. doi: 10.1056/NEJMsa062860. [DOI] [PubMed] [Google Scholar]

[b10-0331_amia_2011_proc] 10.Lindenauer PK, Remus D, Roman S, Rothberg MB, Benjamin EM, Ma A, et al. Public reporting and pay for performance in hospital quality improvement. N Engl J Med. 2007 Feb 1;356(5):486–96. doi: 10.1056/NEJMsa064964. [DOI] [PubMed] [Google Scholar]

[b11-0331_amia_2011_proc] 11.Garg AX, Adhikari NK, McDonald H, Rosas-Arellano MP, Devereaux PJ, Beyene J, et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA. 2005 Mar 9;293(10):1223–38. doi: 10.1001/jama.293.10.1223. [DOI] [PubMed] [Google Scholar]

[b12-0331_amia_2011_proc] 12.Persell SD, Wright JM, Thompson JA, Kmetik KS, Baker DW. Assessing the validity of national quality measures for coronary artery disease using an electronic health record. Arch Intern Med. 2006 Nov 13;166(20):2272–7. doi: 10.1001/archinte.166.20.2272. [DOI] [PubMed] [Google Scholar]

[b13-0331_amia_2011_proc] 13.Chaudhry B, Wang J, Wu S, Maglione M, Mojica W, Roth E, et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med. 2006 May 16;144(10):742–52. doi: 10.7326/0003-4819-144-10-200605160-00125. [DOI] [PubMed] [Google Scholar]

[b14-0331_amia_2011_proc] 14.Kohn L, Corrigan J, Donaldson M, editors. To Err Is Human: Building a Safer Health System. Washington, DC: National Academy Press; 2000. [PubMed] [Google Scholar]

[b15-0331_amia_2011_proc] 15.Nebeker JR, Hoffman JM, Weir CR, Bennett CL, Hurdle JF. High rates of adverse drug events in a highly computerized hospital. Arch Intern Med. 2005 May 23;165(10):1111–6. doi: 10.1001/archinte.165.10.1111. [DOI] [PubMed] [Google Scholar]

[b16-0331_amia_2011_proc] 16.Berwick DM. A user’s manual for the IOM’s ‘Quality Chasm’ report. Health Aff (Millwood) 2002 May-Jun;21(3):80–90. doi: 10.1377/hlthaff.21.3.80. [DOI] [PubMed] [Google Scholar]

[b17-0331_amia_2011_proc] 17.Thielke S, Hammond K, Helbig S. Copying and pasting of examinations within the electronic medical record. Int J Med Inform. 2007 Jun;76(Suppl 1):S122–8. doi: 10.1016/j.ijmedinf.2006.06.004. [DOI] [PubMed] [Google Scholar]

[b18-0331_amia_2011_proc] 18.Weir CR, Hurdle JF, Felgar MA, Hoffman JM, Roth B, Nebeker JR. Direct text entry in electronic progress notes. An evaluation of input errors. Methods Inf Med. 2003;42(1):61–7. [PubMed] [Google Scholar]

[b19-0331_amia_2011_proc] 19.McGlynn EA, Asch SM, Adams J, Keesey J, Hicks J, DeCristofaro A, et al. The quality of health care delivered to adults in the United States. N Engl J Med. 2003 Jun 26;348(26):2635–45. doi: 10.1056/NEJMsa022615. [DOI] [PubMed] [Google Scholar]

PERMALINK

From simply inaccurate to complex and inaccurate: complexity in standards-based quality measures

David A Dorr, MD, MS

Aaron M Cohen, MD, MS

Marsha Pierre-Jacques Williams, BS

John Hurdle, MD, PhD

Abstract

Introduction

Methods

Definition of complexity

Definition of difficulty

Analysis

Case study

Results

Table 1.

Complexity

Difficulty survey

Source of difficulty

Figure 1.

Comparison between difficulty and complexity

Figure 2.

Case Study

Table 2a.

Table 2b.

Discussion

Limitations

Conclusion

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

From simply inaccurate to complex and inaccurate: complexity in standards-based quality measures

David A Dorr, MD, MS

Aaron M Cohen, MD, MS

Marsha Pierre-Jacques Williams, BS

John Hurdle, MD, PhD

Abstract

Introduction

Methods

Definition of complexity

Definition of difficulty

Analysis

Case study

Results

Table 1.

Complexity

Difficulty survey

Source of difficulty

Figure 1.

Comparison between difficulty and complexity

Figure 2.

Case Study

Table 2a.

Table 2b.

Discussion

Limitations

Conclusion

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases