Abstract
Our aim is to demonstrate a general-purpose data and knowledge validation approach that enables reproducible metrics for data and knowledge quality and safety. We researched widely accepted statistical process control methods from high-quality, high-safety industries and applied them to pharmacy prescription data being migrated between EHRs. Natural language medication instructions from prescriptions were independently categorized by two terminologists as a first step toward encoding those medication instructions using standardized terminology. Overall, the weighted average of medication instructions that were matched by reviewers was 43%, with strong agreement between reviewers for short instructions (K=0.82) and long instructions (K=0.85), and moderate agreement for medium instructions (K=0.61). Category definitions will be refined in future work to mitigate discrepancies. We recommend incorporating appropriate statistical tests, such as evaluating inter-rater and intra-rater reliability and bivariate comparison of reviewer agreement over an adequate statistical sample, when developing benchmarks for health data and knowledge quality and safety.
Introduction
Increased use of electronic health records (EHRs) has produced large-scale health data analytics and research networks, and access to large quantities of clinical data from operational EHRs holds much promise. (1,2) However, EHRs are typically configured to satisfy site-specific needs, resulting in differences in clinical documentation practices and data capture between and within EHR systems. (3) These discrepancies can increase the risk of poor data quality when aggregated and utilized for research findings and clinical decision support (CDS) processes. (2, 4-8) A substantial body of research suggests that data collected in EHRs and other operational systems may not be of sufficient quality for research, analysis, and safe patient care across systems. (2, 9-22)
The U.S. Department of Veterans Affairs (VA) is the largest health system in the country and relies on health data from its EHR system to support its mission. (23) Over the next decade, VA intends to migrate all existing health records from the existing Veterans Health Information System Technology Architecture (VistA) EHR to Cerner’s EHR and must ensure that data is transferred safely and reliably. Cerner Millennium and VistA use different clinical terminological concepts and statement models to document observations and actions performed during clinical consultations with patients, and this variation increases the risk of health care errors. In 2021, the VA Secretary commissioned an EHR Data Integration Integrated Project Team (IPT) led by VA’s Chief Data Officer to evaluate data concerns related to the EHR transition. (24) The Data Integrity and Agility (DI&A) workstream within this IPT conducted a series of analyses on sample data that considered data provenance, lineage, and integrity of the current systems. Data integrity ensures that data – from point of order to point of use – has not been altered in an unauthorized manner. (25) Dependent on data integrity is data agility, the ability to respond to new demands for data safely, reliably, and promptly. Data integrity and data agility are foundational components for achieving highly reliable data integration, robust analytics, and enterprise-wide reporting capabilities.
The DI&A workstream efforts included development of Independent Validation and Verification (IV&V) methodologies to evaluate the integrity and agility of data as it moves across EHR systems. IV&V methodologies help ensure a sound data migration strategy intended to reduce patient safety risk and improve data quality for analysis and tracking. The DI&A effort informs our ongoing development of a generalized IV&V approach that can be applied to patient health data from various domains (e.g., medications, immunizations, laboratory results) during EHR deployments. Medications were prioritized as the initial domain to develop and apply the IV&V methodology.
The General Accountability Office (GAO) has supported the importance of developing these IV&V methodologies and has specifically recommended that VA develop performance measures that can monitor the quality and safety of data migration. (23) The GAO report, entitled “VA Needs to Address Data Management Challenges for New System,” states that VA must develop metrics for data quality. (23)
In addition to responding to the findings of the GAO report, reducing preventable patient harm has been a key goal of the VA High Reliability Organization (HRO) journey since 2019. HROs are learning organizations that heavily invest in proactive safety management including process modeling, system safety process analysis, statistical process control methods, system appraisals, risk assessments, performance audits, and frontline reporting of system anomalies without fear of retribution or punishment. (26, 27)
In this paper, our aim is to demonstrate a general-purpose IV&V approach for information models, instance data, and knowledge that will enable reproducible metrics for quality and safety as called for by VA’s HRO efforts and by the GAO. This IV&V approach uses established statistical process control methods for inter-rater and intra-rater reliability and agreement metrics with Cohen’s Kappa to evaluate pharmacy prescription data being migrated between EHRs. These methods are widely accepted in other industries where quality and safety are closely monitored, but the methods are under-utilized within the health IT industry. We demonstrate that these methods can be applied to electronic health data and knowledge management tasks. We recommend that we routinely use bivariate comparison of reviewer agreement over an appropriately sized statistical sample to develop standard benchmarks for health data and knowledge quality and safety. A detailed description of the method is provided in this paper along with summarized analysis findings from our application of these methods to pharmacy prescription data being migrated from VistA to Cerner.
Related Works: The need for a unified model of validation processes is commonly understood as a desideratum within the software testing and modeling and simulation literature. (28, 29) During data migration, it is best practice to compare and test source data against requirements of the destination system by inspecting source-to-destination mappings. (29) Despite scholarly and pragmatic efforts describing models for validation in software and modeling domains, there is a paucity of literature about quality assurance of healthcare knowledge artifacts and knowledge-based systems.
In prior work, Rehwoldt described the existing informatics literature around verification and validation processes: “Yet, while the case for knowledge-based clinical decision support has been repeatedly, and increasingly, prominently made, the practical realities of everyday knowledge quality work by healthcare enterprises, and knowledge-based systems, to a significant extent still remains a black box. In other words, we remain somewhat uncertain as to the extent to which knowledge is being validated at all and, perhaps more importantly, without any clear indication of how it is working.” (30) Existing validation techniques for clinical data lack statistical rigor to verify the quality and reliability of data, demonstrating the need for a robust, quantitative, and reproducible method that will evaluate data as it moves between systems.
A comprehensive, accurate, and evidence-based knowledge base is a prerequisite for effective application of CDS, quality measurement, research, and personalized medicine. (30-32) As VA continues to implement the Cerner Millennium EHR over the next decade, it must continue to provide a single standard of care across all its healthcare facilities, posing a significant challenge due to numerous identified data integrity and agility issues. To ensure data integrity and promote patient safety, VA has an immediate need to develop and implement a definitive internal data validation and robust site readiness process that will enable successful future go-lives across the enterprise.
Background: Evolutionary Design Criteria – Understandable, Reproducible, and Useful (URU): The IV&V methodology and findings presented in this paper are inspired by core evolutionary design principles called “Understandable, Reproducible, and Useful,” upon which SNOMED CT development is still based. (33, 34) These criteria describe an approach for improving data quality and increasing data integrity and agility:
Understandable: The content can be processed by health IT systems and understood by most healthcare providers without reference to private or inaccessible information.
Reproducible: Multiple users or systems apply the data to the same situations and source data with an equivalent result.
Useful: Data is fit-for-purpose – it has practical value for data analysis in support of health information exchange, research, and public health that requires information aggregation across health IT systems.
Methods
This paper builds on IV&V methodologies used in other industries to evaluate the integrity of data as it moves across systems. Ultimately, these methodologies aim to highlight processes that can be applied to various use cases to better ensure patient care and highly reliable semantic interoperability in an objective manner.
General-Purpose IV&V Approach: Our general IV&V approach described below is based on System Safety methods, High Reliability principles, and Statistical Process Control techniques to assess data and knowledge management processes. (35, 36) The approach is an iterative process composed of three phases: (1) Error Discovery and Assessment, (2) Statistical Process Controls, and (3) Continuous Process Improvement. Steps within the approach can be tailored for domain-specific assessment to ensure that a data element’s meaning and purpose are properly captured and preserved within the IV&V approach in a standardized, unambiguous format. The steps to the approach are included below in Figure 1.
Figure 1.
An overview of a general IV&V approach.
Representation Mapping for VA’s Ambulatory Medication Instruction Data: We conducted dual, iterative independent categorization (by humans) of medication instructions derived from ambulatory Sigs, abbreviated from the Latin term signatura, to develop an IV&V methodology that can be expanded to additional use cases to better ensure patient care and highly reliable semantic interoperability in an objective manner. The goal of conducting an independent review is to understand how to: (1) capture a medication instruction’s meaning and (2) develop a generalized IV&V approach for representing medication instructions in a standardized, unambiguous format.
Medication instructions tell the pharmacy what they should include on the prescription label for the patient/caregiver so the patient will know how and when to take the medication. For example, medication instructions, such as “take 1 tablet once a day,” are a key component of prescriptions. While ambulatory electronic prescribing has multiple benefits – including potential for reduced medication errors, saved physician and staff time, and lower medical costs – medication instructions are typically represented in natural language rather than encoded using a terminology system. (37) There are many benefits to encoding instructions with a terminology system, such as automated decision support and reduction in medication dosing errors. To realize these benefits, medication instructions must be reproducibly encoded so that algorithms can operate over the encoded instructions.
Our team initiated a discovery analysis. The discovery phase of our work included the following goals: (1) find and document textual patterns for different fields, beginning with standard fields such as action verb, dose, and dose unit and (2) begin to identify key words that indicate additional information (e.g., “for,” “avoid”) and analyze the words that follow.
We conducted literature reviews and identified a set of input fields into which medication instructions could be broken down to best capture meaning in an URU (understandable, reproducible, useful) manner. A recent study conducted by the University of Catania (38) extracted text from scanned medical prescriptions, then utilized Natural Language Processing (NLP) to aid in the classification of embedded terms and data categories. The study found classification of data requires automatic management of NLP syntactic rules for improved pattern matching, rather than manual writing of rules which is time consuming and error prone. The study provides a starting point on how to apply NLP rules but does not suggest potential data categories to parse medication instructions. Additionally, previous research has explored the variation found in medication instructions, (39) and the National Council for Prescription Drug Programs (NCPDP) developed a framework for structuring and codifying prescription medication instructions that has over 10 fields, each of which contains additional subfields. (37) While the NCPDP framework serves to eliminate the discrepancies found in how medication instruction data is represented, it does not provide a methodology for ensuring high data reliability.
Based on common elements typically included in the free text of medication instructions, we started with the following data elements to represent the words contained in the medication instructions: verb, dose, dose unit, route of administration, frequency, and additional information. (39) After initial testing with these categories, the category “verb” was changed to “action” and the “additional information” category was split into “purpose” and “warning” categories. Once initial data categories were identified, candidate medication instructions were selected to begin independent review to validate the input fields and categories. Medication instructions were first grouped based on length: short (<80 characters), medium (80-130 characters), and long (>130 characters). Ninety-seven (97) medication instructions with roughly equal distribution by length (i.e., roughly 32 per group) were randomly selected to begin independent human review to represent the medication instructions in the categories.
After the medication instructions were selected, a total of 10 out of the 97 medication instructions were chosen from among the three length groups for representation by two subject matter experts (SMEs) with superior knowledge of health informatics and pharmaceutical and clinical data. This step was developed based on the methodology by Levy et al. so that reviewers would have the opportunity to align with each other on the meaning of the categories prior to performing independent review. (40) The SMEs worked together to build out reproducible, standardized representations of the 10 medication instructions by: (1) identifying key components of the medication instruction narrative, (2) applying the data fields/categories for representation, and (3) noting areas of ambiguity and agreeing on a rationale for how to break down the narratives into the data categories.
Finally, once the SMEs established a mutual understanding of the categories, each SME independently represented the remaining 87 selected medication instructions by categorizing them into the previously determined categories. Each SME independently represented a total of 27 short-length medication instructions, 27 medium-length medication instructions, and 33 long-length medication instructions into five different categories: action, method, frequency, purpose, and warning.
Statistical Analysis: Our team calculated and summarized descriptive characteristics of the medication instructions as frequencies and percentages for categorical variables. Following the independent review process explained in the previous section, the team conducted various inter-rater reliability statistical process controls to identify gaps between mappings of the medication instruction categories. Inter-rater reliability is a method of measuring the level of agreement between multiple raters or judges. It is used to assess the reliability of answers produced by different items on a test. If a test has low inter-rater reliability, it could be an indication that the items on the test are confusing, unclear, or even unnecessary. Inter-rater reliability is calculated by the percentage of items on which the reviewers agree. The team utilized a rubric to calculate matches of medication instruction categories. Respective categories were assessed for each medication instruction and compared against each of the SME’s responses. The rubric for matching (Table 1) assigned values to determine which medication instruction categories were a complete match, partial match, or no match. The team calculated the percent match as well as the Cohen’s Kappa of the mappings to identify gaps between mappings of the medication instruction categories. Percent match was defined as either a no match, partial match, or complete match depending on how SMEs represented each category for the medication instructions. A match was considered complete if the words were the same, partial if some of the words were the same, and no match if no words were the same. Moving forward, multiple independent reviews will be conducted to ensure categories properly capture a medication instruction’s meaning.
Table 1.
Rubric for matching.
| Match | Description | Value |
|---|---|---|
| Complete Match | All words in the category are the same | 1 |
| Partial Match | One or more word differences in the category | 0.5 |
| No Match | All words in the category are different | 0 |
Chi-square and Fishers exact tests were used for bivariate comparisons of reviewer matches among the five medication instruction categories across medication instruction lengths. To analyze the reproducibility in categorizing medication instructions, accounting for reviewers agreeing simply at random, we calculated Cohen’s Kappa to estimate the percent agreement between reviewers. For our primary analysis, partial matches and complete matches were combined into a single “agreement” category to calculate Cohen’s Kappa, whereas no matches between the two reviewers remained as “no agreement.” Alternatively, we also calculated the Kappa statistic by keeping complete matches and combining partial matches with no matches into a single “non-complete agreement” category. Values can range between -1 (perfect disagreement) to 1 (perfect agreement), with a value of 0 indicating the two reviewers agreed simply due to pure chance. Here, we set the cutoff for moderate agreement as Kappa >0.6 versus >0.8 for strong agreement.
Results
Error Discovery and Assessment: We received a spreadsheet populated with medication instructions from the VA EHR Modernization group’s Pharmacy Benefits Management group, whose goal is to improve the health status of Veterans by encouraging the appropriate use of medications in a comprehensive medical care setting. This spreadsheet contained medication instructions data currently represented in VistA as strings/natural language and displayed how the medication instructions would be transferred to Cerner in specific data categories. There were roughly 25,000 medication instructions that came from over 210,000 records of data. The categories provided in the medication instructions spreadsheet included the following: verb, unit, form, schedule, and med route.
Of the medication instructions provided, 77% were completely unique and only listed once in all the records, indicating a high level of variability within instruction statements. Medication instructions also varied in length, with the shortest containing 12 characters, the longest containing 546 characters, and the median containing 78 characters. While medication instructions typically contain information for action, dose, dose unit, route of administration, and frequency, the wording of each of these categories may vary as well as the order in which the medication instruction lists this information. Our team also identified many ambiguous medication instructions where additional information was organized in different ways. Table 2 provides examples of medication instructions that start out with the same basic information but include additional information denoted with varying syntax.
Table 2.
Example medication instructions with additional information.
| Medication Instruction Examples |
|---|
| Take one tablet by mouth every day for heart/blood pressure (avoid grapefruit and grapefruit juice) |
| Take one tablet by mouth every day for blood pressure |
| Take one tablet by mouth every day for thyroid |
| Take one tablet by mouth every day **take with food** |
| Take one tablet by mouth every day (blood thinner) – notify anticoag clinic of scheduled procedures/surgery |
To better understand how to capture the meaning and purpose of a medication instruction, our team pinpointed certain components that are common within the additional information of the medication instruction. As demonstrated in Table 2, the additional information often relates to the purpose of the medication (e.g., “for heart/blood pressure,” “for blood pressure,” “for thyroid”). However, some medication instructions also include statements that may be warnings or some type of restriction for the patient (e.g., “avoid grapefruit and grapefruit juice”). Our team conducted specific analysis on key words that represent additional information in many of the medication instructions. The team analyzed the most common first words used in medication instructions and the statement types that result from certain keywords, such as “avoid.” The results of this analysis were useful in better understanding the medication instructions but focused more on the idea of eventually breaking down the medication instructions into a repository.
Representation Mapping: After the initial data discovery process, the team sought to develop categories for representing medication instructions that could be easily understandable, reliable, and of low complexity. While those categories derived from the NCPDP framework serve to eliminate the ambiguity within medication instructions, they are complex and not easily generalizable to other data types. (37) To advance a methodology with less complexity, the team developed four original categories to represent medication instructions: action, guidance, purpose, and warning. These categories are simpler than the original six described as common components in medication instructions and are designed to be more representative and inclusive. Table 3 represents an example of a medication instruction broken down into these categories.
Table 3.
Initial proposed categories for breakdown and representation mapping.
| Action | Guidance | Purpose | Warning | |
|---|---|---|---|---|
| Example | Take one tablet | By mouth every day | For blood pressure | (Avoid grapefruit and grapefruit juice) |
After initial review, it was decided that “guidance” should be broken down into “method” and “frequency” to reduce ambiguity related to the “guidance” category. This resulted in a total of five categories. Table 4 shows the final categories defined in this phase.
Table 4.
Next iteration of proposed categories for breakdown and representation mapping.
| Action | Method | Frequency | Purpose | Warning | |
|---|---|---|---|---|---|
| Example | Take one tablet | By mouth | Every day | For blood pressure | (Avoid grapefruit and grapefruit juice) |
Once the second-round categories were established, SMEs worked together on categorizing 10 medication instructions together before evaluating the remaining medication instructions independently. Table 5 represents a subset of medication instructions that the SMEs represented together.
Table 5.
Subset of initial medication instructions defined by clinical domain SMEs.
| Medication Instruction Candidates | Action | Method | Frequency | Purpose | Warning |
|---|---|---|---|---|---|
| Use 1 needle as directed as needed | Use 1 needle | As directed | As needed | N/A | N/A |
| Apply small amount to affected area as directed by provider as needed for dry skin use 20 minutes after desonide cream | Apply small amount | To affected area as directed by provider/ Use 20 minutes after desonide cream | As needed | For dry skin | N/A |
Table 6 shows the breakdown, by each of the two SME reviewers, of a long medication instruction that explains the use of a sliding scale for medication administration. Upon initial review, the SMEs had a difficult time breaking down these types of medication instructions, but both SMEs concluded that these types of medication instructions should be represented utilizing multiple independent statements. Through the separation of these statements, the reviewers were able to reduce ambiguity in the statement meaning.
Table 6.
Example of the breakdown of long medication instruction that used a sliding scale.
| Example Medication Instruction: Inject 30 units under the skin every morning and inject 24 units noon and inject 24 units every night plus sliding scale blood sugar 150-200=2 units, 201-250=4 units, 251-300=6 units, 301-350=8 units, 351-400=10 units, over 400 (call physicians for medication instructions) **take immediately (0-15 minutes) before the meal. | |||||
| Reviewer | Action | Method | Frequency | Purpose | Warning |
| Reviewer 1 | Inject 30 units plus sliding scale | Under the skin | Every morning | N/A | **Take immediately (0-15 minutes) before the meal. |
| Inject 24 units plus sliding scale | (Implied) | At noon | N/A | **Take immediately (0-15 minutes) before the meal. | |
| Inject 24 units plus sliding scale | (Implied) | Every night | N/A | **Take immediately (0-15 minutes) before the meal. | |
| Insulin sliding scale blood sugar 150-200=2 units, 201-250=4 units, 251-300=6 units, 301-350=8 units, 351-400=10 units, over 400 (call physicians for medication instructions) | N/A | N/A | |||
| Reviewer 2 | Inject 30 units | Under the skin | Every morning | **Take immediately (0-15 minutes) before the meal. | |
| Inject 24 units | Noon | ||||
| Inject 24 units | Every night | ||||
| Plus sliding scale: blood sugar 150-200=2 units, 201-250=4 units, 251-300=6 units, 301-350=8 units, 351-400=10 units, over 400 (call physicians for medication instructions) | |||||
Statistical Process Controls: After both SMEs categorized the remaining 87 medication instructions, we conducted a baseline analysis on the results to understand how effective the existing five categories are at representing the current information. Next, the SMEs reviewed the medication instructions; there were a total of 27 short-length medication instructions, 27 medium-length medication instructions, and 33 long-length medication instructions. Table 7 shows the distribution of matches, partial matches, and mismatches for medication instructions by existing categories and length of the medication instruction. Tests of independence suggest that there is an association between medication instruction category and the percent agreement between the two reviewers for short (p=0.006), medium (p<0.001), and long (p<0.001).
Table 7.
Breakdown of matches by category and medication instruction length.
| Length | Match | Action | Method | Frequency | Purpose | Warning | Total |
|---|---|---|---|---|---|---|---|
| Short-length (p=0.006) | No Match | 2 | 1 | 1 | 2 | 6 | 12 |
| Partial Match | 1 | 7 | 3 | 0 | 1 | 12 | |
| Complete Match | 24 | 19 | 23 | 25 | 20 | 111 | |
| Total Possible | 27 | 27 | 27 | 27 | 27 | 135 | |
| Kappa (Any match) 1 | 0.85 | 0.93 | 0.93 | 0.85 | 0.56 | 0.82 | |
| Kappa (Complete match)2 | 0.78 | 0.41 | 0.70 | 0.85 | 0.48 | 0.64 | |
| Medium-length (p<0.001) | No Match | 0 | 1 | 1 | 7 | 17 | 26 |
| Partial Match | 2 | 13 | 5 | 2 | 1 | 23 | |
| Complete Match | 25 | 13 | 21 | 18 | 9 | 86 | |
| Total Possible | 27 | 27 | 27 | 27 | 27 | 135 | |
| Kappa (Any match)1 | 1.00 | 0.93 | 0.93 | 0.48 | -0.26 | 0.61 | |
| Kappa (Complete match)2 | 0.85 | -0.04 | 0.56 | 0.33 | -0.33 | 0.27 | |
| Long-length (p<0.001) | No Match | 0 | 2 | 0 | 3 | 7 | 12 |
| Partial Match | 9 | 6 | 10 | 0 | 10 | 35 | |
| Complete Match | 24 | 25 | 23 | 30 | 16 | 118 | |
| Total Possible | 33 | 33 | 33 | 33 | 33 | 165 | |
| Kappa (Any match)1 | 1.00 | 0.88 | 1.00 | 0.82 | 0.58 | 0.85 | |
| Kappa (Complete match)2 | 0.45 | 0.52 | 0.39 | 0.82 | -0.03 | 0.43 | |
| Total | No Match | 2 | 4 | 2 | 12 | 30 | 70 |
| Partial Match | 12 | 26 | 18 | 2 | 12 | 50 | |
| Complete Match | 73 | 57 | 67 | 73 | 45 | 315 | |
| Total Possible | 87 | 87 | 87 | 87 | 87 | 435 | |
| Kappa (Any match)1 | 0.95 | 0.91 | 0.95 | 0.72 | 0.31 | 0.77 | |
| Kappa (Complete match)2 | 0.68 | 0.31 | 0.54 | 0.68 | 0.03 | 0.44 |
Counts of partial matches are combined with complete matches. Kappa calculation compares “any match” to “no match.”
Counts of partial matches are combined with no matches. Kappa calculation compares “complete match” to “non-complete match.”
The total percentage of medication instructions that were matched completely the same for all five categories collectively was 43%, with 63%, 26%, and 39% of short, medium, and long-length medication instructions represented the same, respectively. Figure 2 provides a visual overview of these results, where certain categories were matched at a higher rate than others. As seen in the figure, the “action” category received the highest percent match between reviewers. However, “warning” and “method” tended to have more partial or no matches, particularly for medium-length medication instructions (the p<0.05 rejects the claim of independence and shows that matching differs or is dependent by medication instruction part category). Upon review, this is explained by one reviewer often defining one group of words as “warning” while the other would define it as “method,” indicating that there is ambiguity in the existing category names. Partial matches often occurred from category confusion and parsing errors, as reviewers did not always agree on what words to put into certain categories and sometimes missed words when manually parsing medication instructions into categories. The results of this review will provide feedback into the category names, leading to a future secondary study. Within short, medium, and long-length medication instructions, Table 7 shows the overall percent agreement between the two reviewers as well as the percent agreement across the five medication instruction categories. When combining partial and complete matches across all five categories, there is strong overall agreement between reviewers among short-length medication instructions (K=0.82) and long-length medication instructions (K=0.85), as well as moderate agreement among medium-length medication instructions (K=0.61). Looking at each medication instructions category separately, we see that there is agreement among the reviewers in all categories except for the “warning” category for medium-length medication instructions, which shows significant disagreement greater than expected due to chance (K= -0.26).
Figure 2.
Breakdown of medication instruction matches by category and length.
Discussion
Through this initial analysis, there were multiple findings relating to the workflow process. The independent reviews demonstrated varied results in how SMEs categorized medication instructions. Variation in how medication instructions may be categorized impacts data quality, patient safety, and interoperability. A summary of key findings includes the following:
For the primary analysis of estimating Cohen’s Kappa, the assumption was that partial matches were considered part of the “agreement” category, shown in Table 7 as Kappa (any match). This option reflects that while reviewers may have not represented the medication instruction in the exact same manner, partial matches may be caused by small errors where the partial match could otherwise be considered complete. An alternative assumption would be to consider partial matches as part of the “no agreement” category. Table 7 shows the results with this alternative assumption (complete match); there is moderate agreement between short-length medication instructions (K=0.64), while there is weak agreement for medium-length medication instructions (K=0.27) and long-length medication instructions (K=0.43).
Some of the proposed input fields/categories for medication instructions were less ambiguous than others for reproducibly representing the medication instruction content. As seen in Figure 1, the “action” category received the highest percent match between reviewers; however, “warning” and “method” tended to have more partial or no matches, particularly for medium-length medication instructions. Upon review, this is explained by one reviewer often defining one group of words as “warning” while the other defined it as “method.” The variation in these results indicates that future iterations are required to refine the categories of medication instructions to be more easily and uniformly understandable.
While there was not a negative relationship between medication instruction length and percent match between reviewers, medication instructions with more additional information tended to have greater ambiguity in meaning, which was a contributing factor to the lower percent match rate of medium-length medication instructions. Our team initially expected that long medication instructions would have a lower match rate due to their increased complexity, but many of these medication instructions were titration doses, which are types of medication instructions that are complex yet have a predictable structure.
This initial analysis indicated findings relating to the workflow process; first, medication instructions with the largest volume of additional information tend to have greater ambiguity in meaning, and second, the workflow process of medication instructions is often unclear as some medication instructions contain duplicate information or confusing language.
Overall, this paper addresses the needs outlined by VA’s HRO efforts and GAO by demonstrating a site readiness assessment and a general IV&V approach that can enable reproducible metrics for data and knowledge quality and safety. The general IV&V approach includes feedforward pathways that pass control information from one phase to the next, allowing for continuous processing of information for improving the IV&V methodology and site readiness. The described IV&V methodology and evaluation approach should be developed and performed in conjunction with VA’s clinical experts – in both existing and Cerner EHR systems – to assess any potential inconsistencies that would diminish the reliability of future patient care.
Data Collection Limitations: The paper only includes data from one stakeholder domain. We did not recruit data from other stakeholder domains that could assist in validating our IV&V methodology.
Small Participant Size: In the selection of SMEs, we did not recruit additional reviewers with other expertise to test our IV&V methodology and further reduce bias.
Manual Sample Selection: After the initial data discovery process, the team selected length categories arbitrarily to ensure samples from a variety of lengths. Manual review ensured there was varying representation in the types of instructions within each category, but future iterations could use random selection to reduce the potential for selection bias.
Prescriber Workflow: There are limitations in the way a prescriber fills out a medication instruction. This leads to variations in composition of medication instructions due to prescribers interpreting a category in different ways.
Suggestions for Future Work: To continue building out the IV&V approach and ensure it captures necessary steps to be applied to electronic health data and knowledge management tasks, the current IV&V methodology will be refined to demonstrate insightful results in a future model evolution paper. Suggested next steps include: (1) updating and re-configuring categories and (2) continuing additional rounds of mapping to increase URU and improve percent match and inter-rater and intra-rater reliability. Furthermore, there is a need to apply this methodology to additional VA data migration artifacts and data sources to assist in the VistA to Cerner migration and evaluate the integrity and agility of data as it moves across EHR systems.
Conclusion
The findings highlight the need for increased oversight of data quality, provenance, and lineage. To attain a highly reliable data management system, data governance efforts must positively impact downstream patient care. As patient data is migrated from VistA to Cerner, semantic loss caused by errors of commission, omission, syndication, and collection leads to inaccurate representations of patient data that inform clinician decisions. This is not exclusive to VA and is an issue in the sharing of any patient health data across systems. As a result, clinicians are skeptical of the clinical information in their health IT systems, an issue that can lead to suboptimal care decisions. To eliminate preventable patient harm, there is an urgent need to create systems and processes that infuse HRO principles into health IT systems and their associated data sources.
Acknowledgments:
This work was primarily funded by the Veterans Health Administration Clinical Informatics and Data Management Office. This work would not have been possible without the support of Dr. Jonathan Nebeker and Dr. Robert Silverman.
Figures & Table
References
- 1.Liaw ST, Guo JGN, Ansari S, Jonnagaddala J, Godinho MA, Borelli AJ, de Lusignan S, Capurro D, Liyanage H, Bhattal N, Bennett V, Chan J, Kahn MG. Quality assessment of real-world data repositories across the data life cycle: A literature review. J Am Med Inform Assoc. 2021 Jul 14;28(7):1591–1599. doi: 10.1093/jamia/ocaa340. doi: 10.1093/jamia/ocaa340. PMID: 33496785; PMCID: PMC8475229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kahn MG, Callahan TJ, Barnard J, Bauck AE, Brown J, Davidson BN, Estiri H, Goerg C, Holve E, Johnson SG, Liaw ST, Hamilton-Lopez M, Meeker D, Ong TC, Ryan P, Shang N, Weiskopf NG, Weng C, Zozus MN, Schilling L. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC) 2016 Sep 11;4(1):1244. doi: 10.13063/2327-9214.1244. doi: 10.13063/2327-9214.1244. PMID: 27713905; PMCID: PMC5051581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.van der Lei J. Use and abuse of computer-stored medical records. Methods Inf Med. 1991 Apr;30(2):79–80. PMID: 1857252. [PubMed] [Google Scholar]
- 4.Brennan PF, Stead WW. Assessing data quality: from concordance, through correctness and completeness, to valid manipulatable representations. J Am Med Inform Assoc. 2000 Jan-Feb;7(1):106–7. doi: 10.1136/jamia.2000.0070106. doi: 10.1136/jamia.2000.0070106. PMID: 10641968; PMCID: PMC61460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kahn MG, Eliason BB, Bathurst J. Quantifying clinical data quality using relative gold standards. AMIA Annu Symp Proc. 2010 Nov 13;2010:356–60. PMID: 21347000; PMCID: PMC3041459. [PMC free article] [PubMed] [Google Scholar]
- 6.Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PR, Bernstam EV, Lehmann HP, Hripcsak G, Hartzog TH, Cimino JJ, Saltz JH. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013 Aug;51(8 Suppl 3):S30–7. doi: 10.1097/MLR.0b013e31829b1dbd. doi: 10.1097/MLR.0b013e31829b1dbd. PMID: 23774517; PMCID: PMC3748381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bernstam EV, Warner JL, Krauss JC, Ambinder E, Rubinstein WS, Komatsoulis G, Miller RS, Chen JL. Quantitating and assessing interoperability between electronic health records. J Am Med Inform Assoc. 2022 Jan 7. p. ocab289. doi: 10.1093/jamia/ocab289. Epub ahead of print. PMID: 35015861. [DOI] [PMC free article] [PubMed]
- 8.Cholan RA, Pappas G, Rehwoldt G, Sills AK, Korte ED, Appleton IK, Scott NM, Rubinstein WS, Brenner SA, Merrick R, Hadden WC, Campbell KE, Waters MS. Encoding laboratory testing data: case studies of the national implementation of HHS requirements and related standards in five laboratories. J Am Med Inform Assoc. 2022 Jul 12;29(8):1372–1380. doi: 10.1093/jamia/ocac072. doi: 10.1093/jamia/ocac072. PMID: 35639494; PMCID: PMC9277627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cholan RA, Weiskopf NG, Rhoton DL, Colin NV, Ross RL, Marzullo MN, Sachdeva B, Dorr DA. Specifications of Clinical Quality Measures and Value Set Vocabularies Shift Over Time: A Study of Change through Implementation Differences. AMIA Annu Symp Proc. 2018 Apr 16;2017:575–584. PMID: 29854122; PMCID: PMC5977609. [PMC free article] [PubMed] [Google Scholar]
- 10.Cholan RA, Weiskopf NG, Rhoton D, Sachdeva B, Colin NV, Martin SJ, Dorr DA. From Concepts and Codes to Healthcare Quality Measurement: Understanding Variations in Value Set Vocabularies for a Statin Therapy Clinical Quality Measure. EGEMS (Wash DC) 2017 Sep 4;5(1):19. doi: 10.5334/egems.212. doi: 10.5334/egems.212. PMID: 29881739; PMCID: PMC5983064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hemler JR, Hall JD, Cholan RA, Crabtree BF, Damschroder LJ, Solberg LI, Ono SS, Cohen DJ. Practice Facilitator Strategies for Addressing Electronic Health Record Data Challenges for Quality Improvement: EvidenceNOW. J Am Board Fam Med. 2018 May-Jun;31(3):398–409. doi: 10.3122/jabfm.2018.03.170274. doi: 10.3122/jabfm.2018.03.170274. PMID: 29743223; PMCID: PMC5972525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Colin NV, Cholan RA, Sachdeva B, Nealy BE, Parchman ML, Dorr DA. Understanding the Impact of Variations in Measurement Period Reporting for Electronic Clinical Quality Measures. EGEMS (Wash DC) 2018 Jul 19;6(1):17. doi: 10.5334/egems.235. doi: 10.5334/egems.235. PMID: 30094289; PMCID: PMC6078150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hogan WR, Wagner MM. Accuracy of data in computer-based patient records. J Am Med Inform Assoc. 1997 Oct;4(5):342–55. doi: 10.1136/jamia.1997.0040342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Aronsky D, Haug PJ. Assessing the quality of clinical data in a computer-based record for calculating the pneumonia severity index. J Am Med Inform Assoc. 2000 Feb;7(1):55–65. doi: 10.1136/jamia.2000.0070055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Arts D, de Keizer N, Scheffer G-J, de Jonge E. Quality of data collected for severity of illness scores in the Dutch National Intensive Care Evaluation (NICE) registry. Intensive Care Med. 2002 May;28(5):656–9. doi: 10.1007/s00134-002-1272-z. [DOI] [PubMed] [Google Scholar]
- 16.Thiru K, Hassey A, Sullivan F. Systematic review of scope and quality of electronic patient record data in primary care. BMJ. 2003 May 15;326(7398):1070. doi: 10.1136/bmj.326.7398.1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hasan S, Padman R. Analyzing the effect of data quality on the accuracy of clinical decision support systems: a computer simulation approach. AMIA Annu Symp Proc. 2006. pp. 324–8. [PMC free article] [PubMed]
- 18.Cruz-Correia RJ, Rodrigues P, Freitas A, Almeida FC, Chen R, Costa-Pereira A. Data quality and integration issues in electronic health records. In: Hristidis V, editor. Information discovery on electronic health records. Chapman and Hall/CRC; 2009. pp. 55–95. [Google Scholar]
- 19.Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: Data quality issues and informatics opportunities. AMIA Summits on Translational Science proceedings AMIA Summit on Translational Science. 2010;2010:1–5. [PMC free article] [PubMed] [Google Scholar]
- 20.Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton G. Bias associated with mining electronic health records. J Biomed Discov Collab. 2011;6:48–52. doi: 10.5210/disco.v6i0.3581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Brown JS, Kahn M, Toh S. Data quality assessment for comparative effectiveness research in distributed data networks. Medical care. 2013;51:S22–S29. doi: 10.1097/MLR.0b013e31829b1e2c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Rusanov A, Weiskopf NG, Wang S, Weng C. Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med Inform Decis Mak. 2014;14:51. doi: 10.1186/1472-6947-14-51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Government Accountability Office. Electronic Health Records: VA Needs to Address Data Management Challenges for New System. Government Accountability Office. February 2022. https://www.gao.gov/assets/gao-22-103718.pdf (accessed 8 Mar 2022)
- 24.Electronic Health Record Comprehensive Lessons Learned. U.S. Department of Veterans Affairs. 2021. https://federalnewsnetwork.com/wp-content/uploads/2021/07/071421_va_ehr_lessonslearned_FNN.pdf (accessed 8 Mar 2022)
- 25.Chassin MR, Loeb JM. High‐reliability health care: getting there from here. The Milbank Quarterly. 2013;91(3):459–490. doi: 10.1111/1468-0009.12023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Leveson N. Engineering a safer and more secure world. MIT; 2011. [Google Scholar]
- 27.Computer Security Resource Center. National Institute of Standards and Technology U.S. Department of Commerce. 2022. https://csrc.nist.gov/ (accessed 8 Mar 2022)
- 28.Bair LJ, Tolk A. Towards a unified theory of validation. 2013 Winter Simulations Conference (WSC) IEEE. 2013. pp. 1245–56. doi:10.1109/WSC.2013.6721512.
- 29.Data migration phases - 5 steps for a successful migration. Miktysh Blog. 2019. https://miktysh.com.au/5-data-migration-phases/ (accessed 5 Mar 2022)
- 30.Rehwoldt G. A Risk-Based Methodology for the Quality Assurance of Healthcare Knowledge Artifacts (dissertation) 2018.
- 31.Fung KW, Kapusnik-Uner J, Cunningham J, Higby-Baker S, Bodenreider O. Comparison of three commercial knowledge bases for detection of drug-drug interactions in clinical decision support. J Am Med Inform Assoc. 2017 Jul 1;24(4):806–812. doi: 10.1093/jamia/ocx010. doi: 10.1093/jamia/ocx010. PMID: 28339701; PMCID: PMC6080681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Middleton B, Sittig DF, Wright A. Clinical Decision Support: a 25 Year Retrospective and a 25 Year Vision. Yearb Med Inform. 2016;Suppl 1:S103–16. doi: 10.15265/IYS-2016-s034. doi:10.15265/IYS-2016-s034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Spackman K, Guillermo R. Examining SNOMED from the perspective of formal ontological principles: Some preliminary analysis and observations. KR-MED 2004 Proceedings. 2004. pp. 72–80.
- 34.Campbell KE. Distributed development of a logic based controlled medical Terminology. PhD Dissertation, Stanford University. June 1997.
- 35.Dai W, Yoshigoe K, Parsley W. Information Technology-New Generations. Cham: Springer; 2018. Improving data quality through deep learning and statistical models; pp. 515–522. [Google Scholar]
- 36.Woods DD, Dekker S, Cook R, Johannesen L, Sarter N. Behind human error. CRC Press; 2017. [Google Scholar]
- 37.Liu H, Burkhart Q, Bell DS. Evaluation of the NCPDP Structured and Codified Sig Format for e-prescriptions. J Am Med Inform Assoc. 2011 Sep-Oct;18(5):645–51. doi: 10.1136/amiajnl-2010-000034. doi: 10.1136/amiajnl-2010-000034. Epub 2011 May 25. PMID: 21613642; PMCID: PMC3168301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Carchiolo V, Longheu A, Reitano G, Zagarella L. Proceedings of the 2019 Federated Conference on Computer Science and Information Systems. IEEE; 2019. Medical prescription classification: a NLP-based approach. [Google Scholar]
- 39.Yang Y, Ward-Charlerie S, Dhavle AA, Rupp MT, Green J. Quality and Variability of Patient Directions in Electronic Prescriptions in the Ambulatory Care Setting. J Manag Care Spec Pharm. 2018 Jul;24(7):691–699. doi: 10.18553/jmcp.2018.17404. doi: 10.18553/jmcp.2018.17404. Epub 2018 Jan 18. PMID: 29345553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Levy DH, Dolin RH, Mattison JE, Spackman KA, Campbell KE. Computer-facilitated collaboration: experiences building SNOMED-RT. Proc AMIA Symp. 1998. pp. 870–874. [PMC free article] [PubMed]


