Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2011 Oct 22;2011:785–794.

Data quality and fitness for purpose of routinely collected data – a general practice case study from an electronic Practice-Based Research Network (ePBRN)

Siaw-Teng Liaw 1,2, Jane Taggart 1, Sarah Dennis 1, Anthony Yeo 3
PMCID: PMC3243124  PMID: 22195136

Abstract

The practice-based research network (PBRN) is a resource to recruit research participants; conduct developmental and pilot studies; and coordinate multicentre research, teaching, clinical care and quality assurance programs. It is a community–based laboratory for translational, clinical and health services research. The mining of clinical information systems of PBRNs can be used to monitor performance at the service unit level. However, are the routinely collected data of ePBRNs fit for the abovementioned purposes? We describe the establishment and governance of an ePBRN which included general practice and community health and hospital units, The general practice data quality was examined, using diabetes as the context, for completeness, correctness and consistency and assessed on its fitness for research, audit and quality assurance purposes. The quality of social determinants data was generally good while risk factors data were variable. Issues and strategies for improving data quality are discussed.

Introduction

The over-arching context for this paper is integrated care, which many countries, including Australia (1), New Zealand (2), and others (34) have invested in to manage the increasing costs of chronic disease management due to the ageing population, scarcity of resources and costs of health care delivery. A definition of integrated care is “a coherent set of methods and models on the funding, administrative, organisational, service delivery and clinical levels designed to create connectivity, alignment and collaboration within and between the cure and care sectors” (5). Informatics-enhanced integrated care can benefit health care providers and consumers through more accurate and timely information exchange, improved work efficiency by avoiding repetitive work, and better decision-making (67).

Many countries recognise that up-to-date information and information technology is essential to support integrated care to promote and monitor safety, quality and cost-effectiveness. Routinely collected health care data in electronic health records (EHR) are increasingly being mined, aggregated into large data repositories, linked and used for audit, continuous quality improvement in clinical care, health service planning, epidemiological study and evaluation research. This will increase the likelihood and scope of data errors on safety and quality of clinical practice and accurate research in primary and integrated care settings (8). There is also a semantic interoperability problem where information aggregated from different EHRs may be misinterpreted because of different meanings and contexts. The adoption of SNOMED-CT (9) as the national reference terminology is a first step to addressing the semantic interoperability problem in Australia. However, the ease of translation of SNOMED-CT into implementations and ease of use of the terminology by clinicians are significant barriers. There is also the fundamental issue of variations in quality of clinical practice and routine documentation.

The Inquiry into Acute Care in New South Wales Hospitals in Australia emphasised that inter-professional integrated care can only be achieved with a new approach to information technology and health information (10) and recommended an electronic medical record (EMR) that will generate alerts when there is an error in the dosage of drugs, or risk to the patient deduced from the observations, and can “follow” the patient in a way which paper records often fail to do. The Australian National Health & Hospitals Reform Commission (11) recommended that the EMR should emphasise patient control of their own health information, which is the core premise of the proposed Australian Personally Controlled EHR (PCEHR) program (12). The PCEHR depends on data from a number of stakeholders, raising issues of data quality and semantic interoperability.

Practice-based research networks (PBRN) have existed in Europe (8, 13) and North America (1416) for over 25 years, having developed in parallel with the developments in integrated care and chronic disease management. Safety and quality of clinical practice, with accurate and comprehensive documentation, are at the centre of the electronic PBRN (ePBRN) we are developing as part of an information-enhanced integrated care program to examine the what, how and where of integrated services. Our ePBRN links hospital and community-based health services and their information systems to support health services and translational research and longitudinal studies into high prevalence chronic diseases such as type 2 diabetes (T2DM), cardiovascular disease (CVD) and chronic obstructive pulmonary disease (COPD).

Data quality (DQ) is central to the representativeness and validity of the ePBRN data for clinical, teaching, audit, evaluation and research purposes. The generally accepted definition of DQ is encapsulated in the International Standards Organisation definition: “the totality of features and characteristics of an entity that bears on its ability to satisfy stated and implied needs” (ISO 8402-1986, Quality Vocabulary). DQ is defined in terms of its “fitness for purpose” (17). Specific dimensions of DQ have been proposed, including “accuracy, perfection, freshness and uniformity” (18) and “completeness, unambiguity, meaningfulness and correctness” (19). The Canadian Institute for Health Information recommendations were the basis for an information quality framework comprising 69 quality criteria grouped into 24 quality characteristics, which was further grouped into 6 quality dimensions: accuracy, timeliness, comparability, usability, relevance and privacy & security (20). Reference or domain terminologies and ontologies have been shown to affect DQ by influencing data collection and analysis (21) – they can also act as benchmarks for assessing DQ.

This paper describes the development of a framework to guide the development of an ePBRN and an approach to assess and manage the quality of data in an ePBRN.

Overarching conceptual framework (Figure 1)

Figure 1.

Figure 1.

High level conceptual framework for information-enhanced integrated care

(Note: QACPD = Quality Assurance and Continuing Professional Development)

The overarching conceptual framework highlights the socio-ecological complexity that influences the sharing and use of skills, information and resources to maximise the benefit to the patient, community and health system over time. It includes elements identified in the review of integrated care programs: self-management support, patient education, structured clinical follow-up, case management, a multidisciplinary patient care team, multidisciplinary clinical pathways and feedback, reminders, and education for professionals (22). Other factors that influence integrated care include the collegiality and extent of sharing of information, protocols, evaluation and monitoring and professional development training among the stakeholders of the integrated care program. Effective integrated care requires a transformational change towards teamwork, information sharing and work practices; a systems approach to managing chronic disease; eHealth; and continuous quality improvement with ongoing monitoring and evaluation. Higher level policy drivers include reforming health care financing to promote and sustain multidisciplinary integrated care.

Approval for this study was obtained from the University of New South Wales and South West Sydney Local Health Network Human Research Ethics Committees.

Methods

The pilot study involved 3 general practices and a diabetes integrated centre, each using a different clinical information system (CIS), and were of different sizes: (1) small academic practice (n=2565), (2) mid-size practice (n=14,455) and (3) large practice (n=25,370).. Note: As with the Chronic Care Model, we use the CIS to include an electronic health record (EHR), automated reminders and prompts, patient registries and audit and feedback tools. The hospital-based CIS, used by the diabetes centre was also examined (23) but will not be reported in this paper as the determinants and requirements of DQ in community-based general practice are different from those for hospital-based and specialist practice. There were 3 phases to the project: (1) requirements specification based on the conceptual framework, (2) design and establishment of the ePBRN, and (3) evaluation of the data/information quality.

Requirements specification:

The local, national and international literature, both formal and grey, was reviewed and appraised for its relevance to practice-based research networks and information-enhanced integrated care on an ongoing basis. The conceptual framework (Figure 1) provided the context and guided the requirements specification for the ePBRN. This included the organizational structure, technical architecture, informed consent and privacy, connectivity and interoperability standards, provenance and audit protocols, instruments and tools, governance structures and processes, quality monitoring and evaluation. The evaluation plan included the monitoring of the effectiveness of the protocols in place, security of the data management processes, access controls and data quality.

Design and establishment:

A hub and spoke model, building on an existing general practice PBRN, was adopted with a focus on the “secure sharing” objectives (Figure 1). The data extraction and linkage software (GRHANITE™) was selected because it used hash technology to mask a set of personal identifiers for use as a pseudonym for a specific patient and provider, before extracting the information for transmission to a pseudonymised clinical data warehouse (CDW) located on a secure server at the University of New South Wales for storage and manipulation (24). Re-identification by de-hashing the pseudonym can only occur in the source system in the practice, the “medical home”, where patient privacy is sacrosanct within the patient-doctor relationship. GRHANITE™ was installed at each of the practices and scheduled to extract information weekly.

Evaluation:

A participatory action research approach was adopted to examine the determinants of success in the establishment and maintenance of an ePBRN. Information was collected from observations of and interactions with participants in community and hospital-based stakeholders including clinicians and information managers. These were both individual and group interactions, that included a feedback session to each practice. This information was analysed qualitatively.

Microsoft Structured Query Language (SQL) was used to manage the extracted data. SAS (25) was used for data cleansing and analysis. Because the ePBRN is planned as a geographical as well as a virtual network, the hashed identifiers (24) were used to identify “identical” patients in the linked data set. Patients may attend more than one accessible general practice for a range of reasons, including convenience, with implications for continuity of care and prevalence studies.

A data quality (DQ) matrix was developed to assess the ePBRN data, using a conceptual framework based on the published literature (18),(19),(20),(21). The long list of dimensions of DQ described was assessed, within the context of data quality, provenance and curation in the data life-cycle (26). After careful consideration, we decided to use a core DQ matrix as a starting point. This matrix incorporated completeness, correctness and consistency, all expressed as a percentage. Completeness was defined as the availability of at least 1 record per patient. Correctness was defined as a valid and appropriate record; an example is that the height is measured in metres and is within range for age. Consistency was defined as using a uniform data type and format (eg integer, string, date) with a uniform data label (internal consistency) and uses codes/terms that can be mapped to the Australian National Data Dictionary (ANDD) (external consistency). Internal consistency is measured as the proportion of the most prevalent label/total records. External consistency is represented as proportion of the terms that are mappable to the ANDD. Provenance information was not presented because all the data were date, time and author stamped.

Patients with diabetes were identified through relevant information populating the diagnosis, pathology, medication and reason for visit fields. When the patient had more than one record available for analysis, the last (most recent) record was used to calculate the DQ indicators. The DQ indicators as calculated were presented in the findings section. Selected qualitative data was then used in the discussion to further describe and/or explain the patterns of DQ found.

Findings

The two general practices within 5 kilometres of each other had approximately 200 patients in common. In addition to the health service perspective, this finding raises questions about accuracy and multiple counting if routinely collected data in PBRNs were used for prevalence studies (27).

Table 1 provides a high level summary example of the quality of the pilot data set. Registration of indigenous status was poor in terms of completeness as well as correctness. The risk factor data, exemplified by BMI and smoking status, was also poorly completed. However, what was recorded appeared to be correct and, mostly consistent internally. x

Table 1.

High level summary of the quality of data extracted from the 3 pilot general practices

Practice 1 [n=14,461] Practice 2 [n=2,621] Practice 3 [n=25,370]
Completeness No. (%) Correctness No. (%) Completeness No. (%) Correctness No. (%) Completeness No. (%) Correctness No. (%)
DOB 14461 (100) 14456 (99.9) 2602 (99.3) 2615 (99.7) 25311 (99.7) 25311 (99.7)
Sex 14461 (100) (99.9) 2408 (99.5) 2408 (99.5) 25370 (100) 25159 (99.2)
Indigenous status 1 (0.01) NA 1857 (70.9) 1620 (61.8) 18 (0.1) 18 (100)
BMI Not extracted Not extracted 72 (2.7) 106 (100) 2089 (8.2) 2075 (99.3)
Smoking status 4115 (28.4) 4115 (100) 715 (27.2) 715 (100) 3 (0.1) 3 (100)

Tables 2 (all patients) and Table 3 (all records) summarised the DQ of some social determinants (age, sex, ethnicity) and risk factor data of patients with diabetes in the data set. Table 2 shows the DQ of the last record for each patient with diabetes. The pattern of DQ for diabetes patients was similar to that for the overall general practice population. DQ of the last record for diabetes patients compared with all their records was also similar. When available, the data was mostly correct and consistent.

Table 2.

The quality of data pertaining to patients with diabetes in the data set

Practice 1 [n=627 (4.3i)] Practice 2 [n=41 (1.6i)] Practice 3 [n=453 (1.8i)]
Complete No. (%) Correct No. (%) Consistent No. (%) Complete No. (%) Correct No. (%) Consistent No. (%) Complete No. (%) Correct No. (%) Consistent No. (%)
DOB 627 (100) 627 (100) 627 (100) 41 (100) 41 (100) 41 (100) 453 (100) 453 (100) 453 (100)
Sex 627 (100) 627 (100) 627 (100) 41 (100) 41 (100) 41 (100) 453 (100) 453 (100) 453 (100)
Indigenous status 1 (0.1) 1 (100) 0 (0) 28 (68.3) 28 (100) 28 (100) 2 (0.4) 2 (100) 2 (100)
Height result X X X 11 (26.8) 11 (100) 11 (100) 180 (39.7) 180 (100) 180 (100)
Weight result X X X 16 (39.0) 16 (100) 16 (100) 194 (42.8) 194 (100) 194 (100)
BMI result X X X 8 (19.5) 8 (100) 8 (100) 178 (39.3) 178 (100) 178 (100)
Blood pressure result X X X 27 (65.8) 27 (100) 27 (100) 236 (52.1) 235 (99.6) 234 (99.1)
HbA1c record 509 (81.2) 509 (100) 509 (100) 4 (9.7) 4 (100) 4 (100) 392 (86.5) 392 (100) 389 (99.2)
HbA1c result $ $ $ 3 (75) 3 (100) 3 (100) 6 (1.3)& 6 (100) 6 (100)
Total cholesterol record 276 (44.0) 276 (100) NA 0 (0) NA NA 294 (64.9) 294 (100) 294 (100)
Insulin dependent 48 (7.6)* 48 (100) 48 (100) 18 (43.9) * 18 (100) 18 (100) 93 (20.5)* 93 (100) 93 (100)

Notes for Tables 2: X Not extracted;

$

Unable to analyse as inaccessible text format;

&

HbA1c result only extracted for patients registered as diabetes in clinical system;

*

Percentage of patients with diabetes in the practice

Table 3.

The quality of data pertaining to all records for patients with diabetes in the data set

Practice 1 [627 patients] Practice 2 [41 patients] Practice 3 [453 patients]
No. records Correct No. (%) Consistent No. (%) No. records Correct No. (%) Consistent No. (%) No. records Correct No. (%) Consistent No. (%)
Height results X X X 23 22 (95.6) 22 (95.6) 575 571 (99.3) 575 (100)
Weight results X X X 32 31 (96.9) 31 (96.8) 699 698 (99.8) 699 (100)
BMI results X X X 12 12 (100) 12 (100) 566 564 (99.6) 566 (100)
Blood pressure results X X X 102 98 (96.1) 97 (95.1) 1135 1133 (99.8) 1133 (99.8)
HbA1c records 2839 2839 (100) 2839 (100) 12 12 (100) 12 (100) 3156 3156 (100) 3153 (99.9)
HbA1c results $ $ $ 12 9 (75) 9 (75) 10 10 (100) 10 (100)
Total cholesterol records 308 308 (100) 308 (100) 0 NA NA 2697 2697 (100) 2695 (99.9)
Insulin dependent 219 219 (100) 219 (100) 34 34 (100) 34 (100) 587 587 (100) 587 (100)

Notes for Tables 3: X Not extracted;

$

Unable to analyse as inaccessible text format;

&

HbA1c result only extracted for patients registered as diabetes in clinical system;

*

Percentage of patients with diabetes in the practice

Discussion

Our findings suggest that the data quality of routinely collected data in general practice CIS has potential. While consistent with studies that regularly report a range of deficiencies in using routinely collected electronic information for clinical (2831), health promotion (32) or research purposes in hospitals (23), the CIS in our pilot study had some good quality data, depending on the purpose or research question. The completeness of the data varied between the practices. The quality of prescribing and investigations (or Physician Order Entry) data is very good because the CIS had electronic prescribing, test ordering and test results acceptance functionalities. Prescribing data were generally more complete than diagnostic or lifestyle data (33).

A significant finding of relevance, especially for spatial PBRNs in a specific geographical area, is the overlap of patients among practices in that region. This is important for both clinical information exchange and epidemiological prevalence studies to consider and part of quality assurance and risk management.

A key issue is the lack of coding rules or requirements in general practice – this is also true for hospital-based clinicians in the other arm of our overall project. This meant that much of the data were often incomplete or in inaccessible text format. However, when available, the correctness and internal consistency of the data was good. The poor external consistency reflected the lack of implemented terminology and coding standards. We found that, while both prescribing and diagnoses use standard terminologies selected through drop-down lists, the act of selecting a drug from a list is more intuitive in the routine clinical workflow than selecting a diagnosis or problem in a designated field. For example, the lifestyle elements that have low completion rates are usually not categorized or coded.

However, our qualitative findings suggest that GPs find using drop down lists and coding detract from the consultation.

“I mean we can code for the purposes of extracting data and that sort of thing … but it looks it is an added 2 minutes on to a consultation … (it) is difficult to actually code because it takes a different style of thinking instead of actually doing the consultation…” GP Practice 1

“If we just write it in the structured one, it is a bit restricting and … at least I find that it does not help with the practice of knowing the patient well. It is just a bit too restricting.”

“… (coding) … was just another burden (that) wasn’t helping anyone.”

Electronic tools and people strategies must be in place to support staff to enter data accurately and completely and comply with organisational data quality protocols (27). Feedback to clinicians is essential alongside workplace training and support in the consistent use of the CIS and associated decision support and research tools.

Other causes of DQ deficiencies include poorly designed or corrupted database architecture or management system and errors in data extraction (34). User-interface design, drop down menus and decision support tools are important to promote structured data entry and other determinants of DQ. The CIS in the ePBRN had design limitations. One CIS did not allow the recording of BP in different positions during the same consultation or the changing of smoking status over time, contributing to poor data collection and therefore DQ for these data elements. Even the differential access function had a negative impact on DQ. For example, the receptionist having no access to entering marital status because it was packaged with the clinical access data set only meant that the completeness of the data was poor because it depended on the time poor GP to collect the information.

Compared to data, data models are inherently unstable and are influenced by the database management system, security and access management software, organisational processes for data collection and management, and the people in the organisation who enter and use data. These points of instability are likely sources of DQ deficiencies. This is consistent with a comparison of “persons consulting prevalence rates” among four databases in the UK (27), which found considerable variation and suggested that prevalence rates are determined by the database used to generate them and methods used to calculate the rates. A well-designed data model can improve the accuracy and “fitness for use” of datasets. However, the data models of the 3 CIS in the ePBRN are not explicit or transparent, mainly for commercial reasons - another barrier to open and rigorous processes to assess and measure and manage DQ comprehensively. Poorly designed database structure and user interface were also problems:

“We used to (use reason for visit to identify diabetes patients). It is just that we find that if we do that it just clogs up the whole page. Then we can’t search for it.” – GP, Practice 3

We are currently exploring the utility of a proposed conceptual framework to assess the quality of data models using a combination of metrics and subjective assessments, which included correctness, implementability, completeness, understandability, integration, flexibility and simplicity (35).

The lack of transparent and explicit data models is compounded by a lack of metadata to assist researchers in managing and analysing the data in the CIS; analyzing a data set without a map is very labour intensive and inefficient. Explicit statements are needed to explain the source, context of recording, validity check and processing method of any routinely collected data used in research (8) and to guide users about how to find relevant data, select appropriate research methods and ensure that the correct inferences are drawn (36). Metadata should be an integral part of the data model of a CIS and should be designed to assist information managers, especially those who manage large data sets, to understand the complexity of the context within which data collection and management takes place (13).

Even the smallest ePBRN will have to deal with large datasets. This small pilot study assessed and managed tens of thousands of patients and hundreds of thousands of transactions (e.g. prescriptions) or interactions with the GPs. It was more practical to assess and measure the DQ by dealing with smaller and more specific clinical or functional ”chunks” which can be safely automated and managed. This requires the deconstruction of the overall information model down to its most atomic functional level.

Data quality models and ontologies are being developed to enable the development of ontology-based tools for automated specification, assessment and management of data quality (3738). An ontology is the formal, explicit specification of a shared conceptualization that provides a vocabulary of terms, their meanings and relationships to be used in various application contexts so that intelligent agents can act in spite of differences in terminology and their meanings. They enable the modeling of the domain and representation of information requirements to specify the context in collaborative environments (37). Ontologies can accurately specify metadata for DQ specification and assessment for particular clinical domains, which can then be assessed against a DQ ontology constructed from dimensions such as completeness, correctness, consistency, system currency, storage time, and volatility (19). All these assessments can be measured using a ratio scale.

An Information Quality Triangle is being developed in the European Union to measure the quality of the 3 proposed dimensions - objects, concepts and terms - of clinical information on the monitoring and control of infectious diseases collected in CIS/CDW (39). This benchmarking exercise against some selected standards for information quality (17), the Health Level 7 (HL7) Reference Information Model, and the WHO-ATC terminology for drugs, is a useful guide as to the technical dimensions to consider in DQ assurance research.

Robust and valid methods for quality assurance, analysing and interpreting large primary care datasets are required (26). The technical infrastructure (hardware and data management and statistical software) are relatively straightforward issues. However, the organizational, data management and end-user issues that impact on data quality and semantic interoperability are more complex. Some big picture and legislative strategies are underway including: informed consent; appropriate levels of information security and privacy; a reliable unique personal identifier across health (person-based records) and social care (care-based records e.g. child protection): and legal and social issues related to health care policy, financing and professional practice. The Australian health identifier legislation was passed by Parliament in 2010; however, effective implementation across the health and social care sectors will take time.

A systematic data quality R&D program focused on routinely collected clinical data, models and information systems in both primary and secondary care settings, from data collection to linkage to presentation, will increase our understanding of the strategies required to maintain a comprehensive and accurate picture of the health of individuals and communities. The R&D program will potentially improve policy development, planning and implementation; quality monitoring (40); control the costs of external data failure and complementary costs of data-quality assurance (41); and improve the accuracy, validity and reliability of data collection, storage, extraction and linkage algorithms and tools.

Conclusion

Using an ecological conceptual framework of patient-centred and evidence-based integrated care, we have successfully established a pilot ePBRN, assessed the data quality and examined relevant conceptual, development and sociotechnical issues. The conceptual framework and matrix for DQ (completeness, correctness and consistency) is the starting point to inform ongoing work to validate and automate reliable methods to better define and assess DQ and semantic interoperability issues. This will improve the quality and safety of information systems to deliver data that is fit for clinical and research purposes. Research is required to examine DQ to greater depth, furthering existing work on a range of dimensions of DQ in a number of clinical domains. Ontologies, data models, metadata, and reference terminologies are important strategies to consider. Information required to support best practice and develop data quality matrices, models and ontologies must be identified to guide the development of automated agents to specify and assess DQ to enable automated data quality assurance and integration of data from different clinical information systems. Research must also be done to demonstrate positive correlations between DQ and quality of care. This will further understanding of DQ and fitness for purpose of routinely collected information for clinical decision making and collaborative multicentre research in a range of clinical contexts. The key desired outcome is more accurate, context specific, data quality specification and assessment leading to an overall improvement in the safety and quality of care and accuracy of research.

Acknowledgments

Our fellow ePBRN investigators (Chen HY, Harris M, Zwar N, Powell-Davies G, Comino E, Jalaludin B, Bunker J, Vagholkar S, Maneze D) for their previous and ongoing contributions.

References

  • 1.Zwar N, Harris M, Griffiths R, Roland M, Dennis S, Powell-Davies G. APHRI stream four: a systematic review of chronic disease management. Canberra: Australian Primary Health Care Research Institute; 2006. [Google Scholar]
  • 2.Rea H, Kenealy T, Wellingham J, Moffitt A, Sinclair G, McAuley S, et al. Chronic Care Management evolves towards Integrated Care in Counties Manukau, New Zealand. N Z Med J. 2007;120(1252):U2489. [PubMed] [Google Scholar]
  • 3.Esselens G, Westhovens R, Verschueren P. Effectiveness of an integrated outpatient care programme compared with present-day standard care in early rheumatoid arthritis. Musculoskeletal Care. 2009 Mar;7(1):1–16. doi: 10.1002/msc.136. [DOI] [PubMed] [Google Scholar]
  • 4.Weaver MR, Conover CJ, Proescholdbell RJ, Arno PS, Ang A, Uldall KK, et al. Cost-effectiveness analysis of integrated care for people with HIV, chronic mental illness and substance abuse disorders. J Ment Health Policy Econ. 2009 Mar;12(1):33–46. [PubMed] [Google Scholar]
  • 5.Kodner DL, Spreeuwenberg C. Integrated care: meaning, logic, applications, and implications--a discussion paper. Int J Integr Care. 2002;2:e12. doi: 10.5334/ijic.67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Adaji A, Schattner P, Jones K. The use of information technology to enhance diabetes management in primary care: a literature review. Inform Prim Care. 2008;16(3):229–237. doi: 10.14236/jhi.v16i3.698. [DOI] [PubMed] [Google Scholar]
  • 7.Liaw S, Boyle D. Primary Care Informatics and Integrated Care. In: Hovenga EKM, Garde S, Hullin C, editors. Health Informatics An overview. Amsterdam: IOS Press; 2009. pp. 255–268. [PubMed] [Google Scholar]
  • 8.de Lusignan S, Metsemakers J, Houwink P, Gunnarsdottir V, van der Lei J. Routinely collected general practice data: goldmines for research? A report of the European Federation for Medical Informatics Primary Care Informatics Working Group (EFMI PCIWG) from MIE2006, Maastricht, The Netherlands. Inform Prim Care. 2006;14(3):203–9. doi: 10.14236/jhi.v14i3.632. 2006. [DOI] [PubMed] [Google Scholar]
  • 9.International Health Terminology Standards Development Organisation . SNOMED Clinical Terms (SNOMED-CT) International Release January 2009. Copenhagen, Denmark: 2009. [Google Scholar]
  • 10.Garling P. Final Report of the Special Commission of Inquiry: Acute Care in NSW Public Hospitals, 2008 - Overview. 27 November 2008 ed. Sydney: NSW Government; 2008. [Google Scholar]
  • 11.National Health & Hospital Reform Commission . A Healthier Future For All Australians – Final Report of the National Health and Hospitals Reform Commission – June 2009. In: Commonwealth Department of Halth & Ageing, editor. Canberra: Commonwealth of Australia; 2009. [Google Scholar]
  • 12.National eHealth Transition Authority . Draft Concept of Operations: Relating to the introduction of a personally controlled electronic health record (PCEHR) system. Canberra: National eHealth Transition Authority; 2011. Apr, [Google Scholar]
  • 13.de Lusignan S, Hague N, van Vlymen J, Kumarapeli P. Routinely-collected general practice data are complex, but with systematic processing can be used for quality improvement and research. Inform Prim Care. 2006;14(1):59–66. doi: 10.14236/jhi.v14i1.615. [DOI] [PubMed] [Google Scholar]
  • 14.Mold JW, Peterson KA. Primary Care Practice-Based Research Networks: Working at the Interface Between Research and Quality Improvement. Ann Fam Med. 2005;3(suppl_1):S12–20. doi: 10.1370/afm.303. 2005 May 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Pace WD, Staton EW, Holcomb S. Practice-Based Research Network Studies in the Age of HIPAA. Ann Fam Med. 2005 May 1;3(suppl_1):S38–45. doi: 10.1370/afm.301. 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tierney WM, Oppenheimer CC, Hudson BL, Benz J, Finn A, Hickner JM, et al. A National Survey of Primary Care Practice-Based Research Networks. Ann Fam Med. 2007;5(3):242–250. doi: 10.1370/afm.699. 2007 May 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wang RY. A product perspective on total data quality management. Communications of the ACM. 1998;41(2 (Feb)):58–65. [Google Scholar]
  • 18.Redman T. Measuring data accuracy. In: Rea Wang., editor. Information Quality. Armonk NY: ME Sharpe Inc; 2005. p. 21. [Google Scholar]
  • 19.Wand Y, Wang RY. Anchoring data quality dimensions in ontological foundations. Communications of the ACM. 1996;39(11 (Nov)):86–95. [Google Scholar]
  • 20.Kerr K, Norris A, Stockdale R, editors. Data quality, information and decision making: a healthcare case study. 18th Australasian Conference on Infomration Systems; 2007. [Google Scholar]
  • 21.Brown PJ, Warmington V, Laurence M, Prevost AT. Randomised crossover trial comparing the performance of Clinical Terms Version 3 and Read Codes 5 byte set coding schemes in general practice. BMJ. 2003 May 24;326(7399):1127. doi: 10.1136/bmj.326.7399.1127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ouwens M, Wollersheim H, Hermens R, Hulscher M, Grol R. Integrated care programmes for chronically ill patients: a review of systematic reviews. Int J Qual Health Care. 2005 Apr;17(2):141–146. doi: 10.1093/intqhc/mzi016. [DOI] [PubMed] [Google Scholar]
  • 23.Liaw S, Chen H, Maneze D, JT, Dennis S, Vagholkar S, et al. Use of the “principal diagnosis” in emergency department databases to identify patients with chronic diseases. Electronic Health Informatics Journal. 2011 (in press). [Google Scholar]
  • 24.Liaw S, Boyle D. Primary Care Informatics and integrated care of chronic disease. In: Hovenga EKM, Garde S, Cossio CHL, editors. Health Informatics: An Overview. Berlin: IOSPress; 2010. [PubMed] [Google Scholar]
  • 25.SAS Institute Inc . SAS: Statistical software. 9.2 ed. Cary (NC): 2004. [Google Scholar]
  • 26.de Lusignan S, Liaw S-T, Krause P, Curcin V, Vicente M, Michalakidis G, et al. IMIA Yearbook 2011. 2011. Key concepts to assess the readiness of data for International research: Data quality, lineage and provenance, extraction and processing errors, traceability, and curation. (in press). [PubMed] [Google Scholar]
  • 27.Jordan K, Clarke A, Symmons D, Fleming D, Porcheret M, Kadam U, et al. Measuring disease prevalence: a comparison of musculoskeletal disease using four general practice consultation databases. Br J Gen Pract. 2007;57:7–14. [PMC free article] [PubMed] [Google Scholar]
  • 28.Azaouagh A, Stausberg J. [Frequency of hospital-acquired pneumonia--comparison between electronic and paper-based patient records] Pneumologie. 2008 May;62(5):273–8. doi: 10.1055/s-2008-1038099. [DOI] [PubMed] [Google Scholar]
  • 29.Mitchell J, Westerduin F. Emergency department information system diagnosis: how accurate is it. Emerg Med J. 2008 Nov;25(11):784. doi: 10.1136/emj.2007.050104. [DOI] [PubMed] [Google Scholar]
  • 30.Moro ML, Morsillo F. Can hospital discharge diagnoses be used for surveillance of surgical-site infections. J Hosp Infect. 2004 Mar;56(3):239–41. doi: 10.1016/j.jhin.2003.12.022. [DOI] [PubMed] [Google Scholar]
  • 31.de Lusignan S, Khunti K, Belsey J, Hattersley A, van Vlymen J, Gallagher H, et al. A method of identifying and correcting miscoding, misclassification and misdiagnosis in diabetes: a pilot and validation study of routinely collected data. Diabet Med. 2010;27:203–209. doi: 10.1111/j.1464-5491.2009.02917.x. [DOI] [PubMed] [Google Scholar]
  • 32.Gillies A. Assessing and improving the quality of information for health evaluation and promotion. Methods Inf Med. 2000 Aug;39(3):208–212. [PubMed] [Google Scholar]
  • 33.Thiru K, Hassey A, Sullivan F. Systematic review of scope and quality of electronic patient record data in primary care. BMJ. 2003 May 17;326(7398):1070. doi: 10.1136/bmj.326.7398.1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Michalakidis G, Kumarapeli P, Ring A, van Vlymen J, Krause P, de Lusignan S. A system for solution-orientated reporting of errors associated with the extraction of routinely collected clinical data for research and quality improvement. Stud Health Technol Inform. 2010;160(Pt 1):724–728. [PubMed] [Google Scholar]
  • 35.Moody D, editor. Measuring the quality of data models: an evaluation of the use of quality metrics in practice. 11th European Conf on Information Systems; 2003. [Google Scholar]
  • 36.de Lusignan S, van Weel C. The use of routinely collected computer data for research in primary care: opportunities and challenges. Family Practice. 2006;23(2):253–263. doi: 10.1093/fampra/cmi106. 2006 April 1. [DOI] [PubMed] [Google Scholar]
  • 37.Ganguly P, Ray P, Parameswaran N. Semantic Interoperability in Telemedicine through Ontology-Driven Services. Telemedicine & e-Health. 2005;11(3):405–412. doi: 10.1089/tmj.2005.11.405. [DOI] [PubMed] [Google Scholar]
  • 38.Ying W, Wimalasiri J, Ray P, Chattopadhyay S, Wilson C. An Ontology Driven Multi-agent Approach to Integrated e-Health Systems. Int J E-Health & Med Communications (IJEHMC) 2010 2010 Jan-Mar;1(1):28–40. [Google Scholar]
  • 39.Choquet R, Qouiyd S, Ouagne D, Pasche E, Daniel C, Boussaïd O, et al. The Information Quality Triangle: a methodology to assess clinical information quality. Stud Health Technol Inform. 2010;160((Pt 1)):699–703. [PubMed] [Google Scholar]
  • 40.Wang RY, Pierce EM, Madnick SE, Fisher CW, editors. Information Quality. Armonk, NY: ME Sharpe Inc; 2005. [Google Scholar]
  • 41.Wang RY, Storey V, Firth C. A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering. 1995;7(4 (Aug)):623–640. [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES