Assessing Data Quality: From Concordance, through Correctness and Completeness, to Valid Manipulatable Representations

Patricia Flatley Brennan; William W Stead

doi:10.1136/jamia.2000.0070106

editorial

. 2000 Jan-Feb;7(1):106–107. doi: 10.1136/jamia.2000.0070106

Assessing Data Quality

From Concordance, through Correctness and Completeness, to Valid Manipulatable Representations

Patricia Flatley Brennan ¹, William W Stead ²

PMCID: PMC61460 PMID: 10641968

The papers by Stein et al.¹ and Aronsky and Haug² address the quality of the data found in clinical record systems. Stein et al. approach the problem as one of internal consistency. Their paper explores concordance within record systems, exploring the extent to which evidence found in one part of a clinical database is consistent with evidence found in another part. Specifically, they examine agreement between entries in a free-text narrative field with data found in coded fields. Aronsky and Haug examine concordance across two different clinical record systems—the HELP computerized clinical record system and a reference standard consisting of the sum of all information available in the paper chart and the computerized clinical record. Aronsky and Haug complement their appraisal of concordance with an outcome evaluation, determining the level of agreement in clinical severity indexes resulting from the information contained in the different record systems.

Stein et al. use the measure of internal concordance to alert users of a computerized record system to the fact that they may get misleading answers unless they query each field that might contain a piece of information and resolve any discrepancies. Aronsky and Haug argue the need for equivalence in recommendations based on the clinical record, regardless of which form of a clinical record is used; they are saying, in effect, that records with higher concordance should lead to similar recommendations.

The use of the term “concordance” is appropriate for framing the question posed in each of these papers. Concordance originated in the 14th century church, referring to a companion text to an original document, the companion text consisting of an enumeration of all terms appearing in the original document. These authors extended the target object from “terms” to “concepts” and restricted the enumeration of concepts to only those relevant to specific clinical phenomena. The authors' uses of the term “concordance” implies that the question of interest is the extent to which all clinically significant concepts found in one section (Stein et al.) or form (Aronsky and Haug) are evident in another section or form. The evidence presented in the two papers clearly indicates that the two sections or forms are not in complete accord, and the discussions evaluate the consequences of the discord.

Clinical records, be they paper or electronic, are no more and no less than representations of the true state of the patient and the events occurring during the process of care for the patient. Concordance is a characteristic of representations: It is possible to estimate the degree of similarity within or between representations, such as patient records. While one could dispute the merits of different computational forms employed to characterize concordance, the basic intent of such a statistic is valid—to gauge the level of agreement between two documents. At the same time, it is critical to recognize that concordance offers no indication as to whether the representations themselves are true and accurate depictions of the real state of the patient.

Clinical data are a scarce and expensive resource. These studies advance our understanding of the degree to which we can re-use data recorded in today's clinical information systems for practice management, decision support, and clinical or health services research. This type of work should be extended in two directions—prospective studies of data accuracy in extant clinical records and methodological studies of strategies to produce more robust language structures for representing clinical phenomena.

Hogan and Wagner³ provide a model for examining data accuracy by assessing correctness and completeness. The approach they use enhances the concordance studies not only by examining data in the clinical record but by prospectively constructing a gold standard so that the patient and care provider can be used as information sources. In essence, this approach goes further than concordance to ensure that the record is a correct representation of the state of the patient.

At best, however, data concordance and accuracy studies are only as good as the underlying representations. These studies have an inherent limitation, because they rely on the vocabulary primitive of term or phrase. Almost all studies in this realm employ some type of parsing strategy to select specific words or phrases on which to evaluate agreement. These words and phrases themselves may be overly limiting the ability of clinical records to represent the true state of the patient, because they force the reduction of expressions of complex clinical phenomena into atomic words and phrases. Studies based on the vocabulary primitive of “terms” rather than on more sophisticated representations of patient phenomena remain restricted by the fundamental nature of records.

It is possible to envision language structures in the clinical record that are more robust than simple words and phrases. Emerging work in concept maps and compositional vocabularies promises to provide tools for characterizing patient phenomena in a manner that can be dynamically manipulated and provide a more meaningful image of the true state of the patient and the actual care process.

Based on thinking originating more than 70 years ago in the work of Ogden and Richards, ⁴ it is logical to expect that any representation system should be appraised for its ability to provide truthful depictions of the real-life state of the patient and for equivalence across representational forms. The current state of research in formal language focuses on the development of computable language structures for creating textual representations of clinical phenomena. That work, in essence, strives to ensure that the words and phrases used to depict clinical observations in the patient record remain as true-to-life as possible, by capturing not only syntactic meaning but also semantic interpretability. Records based on these representations should be assessed for validity and manipulability. A valid representation would provide an honest, true-to-life depiction of the patient. A manipulable representation would support knowledge-based interpretation of clinical observations and automatic application of decision support tools.

Therefore, there is need, and room, in the field of medical informatics for multiple research trajectories that converge on the problem of ensuring the validity of the clinical record as a representation of the true state of the patient. Papers such as those presented by Stein et al. and Aronsky and Haug are necessary but not sufficient endeavors in the quest for the Holy Grail of medical informatics—the computer based patient record.—Patricia Flatley Brennan, William W. Stead

References

1.Stein HD, Nadkarni P, Erdos J, Miller PL. Exploring the degree of concordance of coded and textual data in answering clinical queries from a clinical data repository. J Am Med Inform Assoc. 2000;7(1):42-52. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Aronsky D, Haug PJ. Assessing the quality of clinical data in a computer-based record for calculating the Pneumonia Severity Index. J Am Med Inform Assoc. 2000;7(1):53-63. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Hogan WR, Wagner MM. Accuracy of data in computer-based patient records. J Am Med Inform Assoc. 1997;4(5):342-55. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Ogden CK, Richards IA. The Meaning of Meaning. 8th ed. Orlando, Fla: Harcourt Brace Jovanovich, 1989. Cited in: Campbell KE, Oliver DE, Spackman KA, Shortliffe EH. Representing thoughts, words, and things in the UMLS. J Am Med Inform Assoc. 1998;5(5):421-31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref1] 1.Stein HD, Nadkarni P, Erdos J, Miller PL. Exploring the degree of concordance of coded and textual data in answering clinical queries from a clinical data repository. J Am Med Inform Assoc. 2000;7(1):42-52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] 2.Aronsky D, Haug PJ. Assessing the quality of clinical data in a computer-based record for calculating the Pneumonia Severity Index. J Am Med Inform Assoc. 2000;7(1):53-63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] 3.Hogan WR, Wagner MM. Accuracy of data in computer-based patient records. J Am Med Inform Assoc. 1997;4(5):342-55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] 4.Ogden CK, Richards IA. The Meaning of Meaning. 8th ed. Orlando, Fla: Harcourt Brace Jovanovich, 1989. Cited in: Campbell KE, Oliver DE, Spackman KA, Shortliffe EH. Representing thoughts, words, and things in the UMLS. J Am Med Inform Assoc. 1998;5(5):421-31. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Assessing Data Quality

Patricia Flatley Brennan

William W Stead

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Assessing Data Quality

Patricia Flatley Brennan

William W Stead

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases