Abstract
Background
General practice data are increasingly used to estimate chronic disease prevalence. Concerns remain about data completeness and fragmentation, particularly when patients attend multiple practices. Previous studies have restricted analyses to only include ‘active’ patients (frequent clinical encounters), assuming that these records are more complete and representative; however, the validity of this approach has not been tested. This study examines whether the prevalence estimated from patient-level (linked) general practice records differs from the common approach of using active practice-level (unlinked) general practice records.
Methods
This retrospective cohort study used de-identified electronic health records from the MedicineInsight dataset, comprising 694,004 patients aged 18 years and older from 39 general practices in Western Australia, covering approximately 32.7% of the state’s adult population as of January 26, 2022. Patient demographics, diagnoses, and clinical encounters were analysed.
Results
Condition prevalence estimates vary depending on cohort definition and the inclusion of patients with low general practice engagement. Active patients had higher median encounters (9 vs. 4) and consistently higher condition prevalence across all chronic diseases, including hypertension (18.2% vs. 11.6%), diabetes (7.0% vs. 4.6%), and asthma (11.3% vs. 8.1%), demonstrating systematic overestimation when analyses exclude patients with lower healthcare utilisation. The patient-level cohort captured more total diagnoses due to its larger denominator (257,023 total diagnosed conditions across the N = 608,000 patient-level cohort, versus 133,235 total diagnosed conditions across N = 201,817 practice-level active patients).
Conclusion
Diagnostic information in general practice records is often dispersed across practices, affecting population planning and research. Linking patient records across practices enhances diagnostic visibility and reveals a more complete picture of chronic disease burden, highlighting the risk of overestimating disease prevalence when analyses are restricted to active patient records alone. This overestimation likely results from excluding healthier patients with fewer healthcare encounters. Small differences in prevalence estimates can have substantial implications on population-level planning, potentially affecting funding allocations, clinical guideline, and workforce decisions. These findings suggest the need for linked general practice datasets to improve the accuracy of prevalence estimates and inform effective policy and resource allocation decisions in primary care.
Keywords: Data fragmentation, General practice, Primary care, Electronic health records, Data linkage, Condition completeness, Consistency of reporting
Introduction
General practitioners (GPs) are the principal providers of primary care in Australia, delivering preventative health services, managing acute and chronic diseases, and coordinating access to specialist care. Approximately 90% of Australians visit a general practice each year [1] making general practice data a potentially powerful resource for population health monitoring. Accurate and complete patient records are essential not only for clinical decision-making [2] and risk stratification [3, 4] but also for reducing diagnostic error [5], avoiding unnecessary investigations [6], informing safe prescribing practices [7] and enabling the generation of reliable epidemiological insights. Incomplete or inconsistent documentation, however, can compromise both clinical care, public health research, and policy development [8–10] and broader health system planning [11, 12].
One of the key challenges to data quality in general practice is the fragmentation of patient records. Unlike in the UK, where patients are required to register with one general practice [13], approximately one in four Australians attend more than one GP practice in a given year [14, 15]. Yet patient records are typically stored in siloed practice-level systems [8]. This fragmentation presents a methodological challenge for researchers as the analyses may be conducted using practice-level data – where patients attending multiple practices are counted separately at each practice, or patient-level data – where individual patients are uniquely identified across the entire practice cohort. To help resolve these challenges, data linkage is a methodology used to identify and consolidate patient records from participating practices into a single, unified research dataset, thereby transforming fragmented practice-level data into patient-level data. This ensures that each person in the dataset has a single and unique representation. The issue of GP data fragmentation is not unique to Australia, with international studies also highlighting similar limitations, including barriers to data integration [16], inconsistent coding practices [17] and incomplete health information [18, 19]. In Australia, these challenges are particularly evident in efforts to monitor primary care performance and estimate population disease burden, including the derivation of prevalence rates for chronic conditions [12, 20].
In the absence of patient-level data across practices, patients who attend multiple providers may have incomplete or duplicated health records [21, 22], leading to gaps in diagnosis, misclassification of health status, and the inflation of service counts [8–10]. These issues are further exacerbated by Australia’s federated healthcare system, where jurisdictional differences in infrastructure and governance constrain national-level integration of general practice data [23] and further restrict the use of GP data for routine public health reporting and policy development [24–27].
Efforts to derive chronic disease prevalence from general practice data are particularly affected by these limitations. Although national prevalence estimates are available from sources such as the Australian Bureau of Statistics’ National Health Survey [28] and the Person Level Integrated Data Asset (PLIDA) [29], these rely on self-reported, administrative, or medication-based indicators rather than primary care diagnoses [47, 48]. While general practice datasets such as MedicineInsight [30–32] have been used to estimate the prevalence of chronic conditions such as diabetes [33] and chronic kidney disease [34], direct comparisons with national survey data are hindered by differences in data structure, collection and underlying population coverage [28, 35, 36]. Furthermore, variability in coding practices, lack of standardisation across clinical software, and incomplete capture of diagnostic fields are well-documented issues in general practice data [35, 36].
To mitigate these concerns, many studies restrict analyses to “active” patients - those with three or more clinical encounters in two years [37], under the assumption that these records are more complete and that the patients are more engaged with care [38]. This convention is also thought to reduce the risk of double-counting individuals who may attend multiple practices. However, this approach may inadvertently exclude large segments of the population, including patients who are generally healthy or who access care intermittently for specific issues. Whether these exclusions meaningfully distort estimates of chronic condition prevalence has not been empirically tested, despite the widespread use of this assumption in epidemiological reporting. This may have considerable implications, as even small differences in prevalence estimates can have substantial implications at a population level [39], particularly for rarer conditions [40] where misestimating the burden can affect funding allocations, clinical guideline priorities, and workforce planning [41].
This study addresses the evidence gap by evaluating whether restricting analyses to practice-level active patient records provides comparable estimates of chronic disease prevalence to those obtained from patient-level general practice data. In doing so, the study informs whether the current approach of using practice-level records from frequently attending patients is sufficient or whether patient-level analysis improves the accuracy and representativeness of general practice-derived prevalence estimates.
Research design and methods
Data source and study sample
The data for this retrospective cohort study were obtained from the MedicineInsight database, a national general practice data program in Australia established by NPS MedicineWise [31, 42] and from January 2023 under the custodianship of the Australian Commission on Safety and Quality in Health Care [32]. This database compiles de-identified electronic health records from consenting general practices across Australia [8, 31]. For this study, records from 694,004 individuals aged 18 years or older were extracted from 39 Western Australian general practices with Bloom capability for Privacy-Preserving Record Linkage using Bloom Filters (PPRL), covering 32.7% of the Western Australian adult population, up to January 26, 2022. The dataset included patient demographics, diagnoses, reasons for consultations, patient assessments, and clinical measurements [30].
Eligible cohort
The study population included all individuals with recorded visits to one or more general practices during the study period. All clinical encounters, regardless of reason for visit, were included in the analysis.
Data linkage
Privacy-Preserving Record Linkage (PPRL) was used to encode MedicineInsight records, which linked records from different participating general practice datasets belonging to the same individual in a deidentified fashion using a probabilistic approach based on the statistical likelihood that two records refer to the same person, rather than on exact, unique identifiers [43].This linkage was performed by Curtin University’s Centre for Data Linkage, which utilised advanced data linkage methodologies described elsewhere [43] and has been demonstrated to be effective at providing privacy assurances with minimal data loss and maintaining a high level of linkage accuracy [44, 45]. This process was used to identify and consolidate records belonging to each de-identified individual within and across the 39 practices.
Analysis cohorts
Following data linkage, patients were categorised based on the number of distinct general practices attended, identified through probabilistic data linkage methods that consolidated records across sites to create individual-level longitudinal records. This approach enabled the identification of patients who visited a single practice, as well as those who attended two, three, or more practices. The primary focus of the analysis was to examine the distribution of chronic condition recording across linked patient records.
Activity status definition
Patients were classified as “active” based on the Royal Australian College of General Practitioners (RACGP) criterion: an individual was considered active if they had attended a practice three or more times in the two years preceding the data extraction date. For the patient-level cohort, if any record associated with an individual met the active criteria, the entire linked profile was designated as “active linked”, enabling a comprehensive analysis of each individual’s full diagnostic history.
Diagnosis and condition grouping
Condition flags, provided in the dataset, were used to simplify the analysis. They were generated when the condition or a relevant synonym was documented in any of the ‘Diagnosis’, ‘Reason for visit’, or ‘Reason for prescription’ fields. Diagnoses were sourced from structured ‘diagnosis’ and ‘observation’ fields in the patient records. These fields are populated using standardised clinical terminologies, and MedicineInsight’s extraction framework was designed to recognise common synonyms, misspellings, and clinical abbreviations [30]. Individual diagnosis flags, represented as binary indicators, were grouped into ten clinically relevant condition categories. Grouping was based on expert clinical input and alignment with standard disease taxonomies. The conditions of interest are shown in Table 1. An individual was considered to have a grouped condition if any contributing diagnosis flag was positive in any associated record.
Table 1.
Mapping of clinical condition groups to diagnostic flags
| Condition Group of interest | Associated Diagnosis Flags |
|---|---|
| CHD | Coronary Heart Disease and Atherosclerosis, Coronary Heart Disease Related Procedure |
| HF | Heart failure |
| Stroke | Stroke (All) |
| CVD Other | Atrial Fibrillation, Atrial Flutter, Peripheral Vascular Disease, Transient Ischaemic Attack |
| Lipid Disorders | Dyslipidaemia, Hypercholesterolaemia, Hyperlipidaemia, Hypertriglyceridemia |
| Hypertension | Hypertension |
| Asthma | Asthma |
| COPD | Chronic Obstructive Pulmonary Disease |
| Diabetes | Diabetes Mellitus T1, Diabetes Mellitus T2 & Diabetes Mellitus Unspecified |
| CKD | Chronic Kidney Disease – Stages 1–5, & Unspecified |
CVD = cardiovascular disease; CHD = coronary heart disease; HF = heart failure; COPD = chronic obstructive pulmonary disease; CKD = chronic kidney disease
Clinical encounter frequency and condition count calculations
General practice utilisation was assessed by counting clinical encounters recorded for each individual within the last two years preceding their last clinical encounter. Where patients attended multiple practices, records were linked to consolidate all encounters and diagnoses into a single patient profile, ensuring each individual was counted only once. Clinical encounter counts across the 2 years were summarised using the median and interquartile range (IQR), and grouped into predefined categories: one, two, three, and more than three visits.
Chronic condition burden was evaluated using predefined diagnostic flags. The total number of conditions of interest recorded for each patient was calculated for linked patients, as well as for the subset who attended more than one practice.
Data analysis
All statistical analyses were conducted using Python [46] in the Jupyter Notebook [47] environment. For prevalence, each condition category was evaluated across two cohorts: (1) unique practice-level record with active status (unlinked active patients), which represent the cohort definition generally used in studies that analysed general practice data; and (2) patient-level (linked patients). Conditions and clinical encounters were summarised using the median and Interquartile Range (IQR).
For patient-level cohorts, diagnoses were collapsed at the individual level using a maximum function across associated records to ensure each person was counted once per condition. Prevalence was defined as the percentage of patients with a recorded diagnosis in each cohort.
Ethics
Ethics approval was obtained from the Curtin University Human Research Ethics Committee (HRE2019-0619). A waiver of consent was granted, and no identifying information was available to the researchers.
Results
A total of 694,004 patients were included in the initial dataset (Fig. 1). Of these, 692,317 individuals had a valid linkage ID, resulting in 608,000 individuals identified as unique patients after consolidating records across all 39 general practices. Within the patient-level (unique patients) and practice-level (includes duplicate patients) cohorts, 195,929 and 201,817, respectively, met the criteria for Active status, defined as having three or more general practice visits (clinical encounters) within the past two years.
Fig. 1.

Cohort selection flow chart
Cohort characteristics
Patient characteristics are presented in Table 2 for the two cohorts. The first cohort comprises active MedicineInsight patients analysed at the practice level (N = 201,817), where patients who attend multiple practices are counted separately for each practice they visited. The second cohort represents the same patients examined at the individual level using linked (patient-level) data (N = 608,000), where each unique patient is counted only once, regardless of the number of practices they attended. This linkage approach provides a deduplicated view of the patient population across all 39 practices. The patient-level cohort showed a higher proportion of males (46.1% vs. 43.8%) compared to the active practice-level cohort, with correspondingly fewer females (53.0% vs. 55.7%). Age distribution showed that active practice-level patients tended to be older than those in the patient-level cohort. A larger proportion of active patients were represented in the older age categories, while younger age groups (particularly those aged 20–39 years) were more prevalent in the patient-level cohort.
Table 2.
Characteristics of practice-level active patients and patient-level patients
| Practice-Level Patients (All MedicinesInsights Patients) | Patient-Level Patients | |
|---|---|---|
| Characteristics | Active Patients (N = 201,817) | Unique Patients (N = 608,000) |
| Gender, n (%) | ||
| Male | 88,445 (43.8) | 280,320 (46.1) |
| Female | 112,404 (55.7) | 322,307 (53.0) |
| Intersex/Indeterminate/Not Stated/Not Recorded | 968 (0.5) | 5,373 (0.9) |
| Age, n(%) | ||
| 18–19 | 5,326 (2.6) | 15,218 (2.5) |
| 20–29 | 33,266 (16.5) | 106,077 (17.4) |
| 30–39 | 40,492 (20.1) | 139,011 (22.9) |
| 40–49 | 34,624 (17.2) | 104,939 (17.3) |
| 50–59 | 32,303 (16.0) | 89,321 (14.7) |
| 60–69 | 27,067 (13.4) | 71,180 (11.7) |
| 70–79 | 18,946 (9.4) | 47,372 (7.8) |
| 80–89 | 7,994 (4.0) | 23,796 (3.9) |
| 90–99 | 1,741 (0.9) | 9,698 (1.6) |
| Outliers | 58 (0.1) | 1,388 (0.2) |
| Median | 46 | 43.0 |
| Mean | 47.5 | 46.6 |
| Clinical Encounters | ||
| Number of clinical encounters | Number of patients (%) | Number of patients (%) |
| 1 | 5,096 (2.5) | 131,388 (22.4) |
| 2 | 13,663 (6.8) | 94,762 (16.1) |
| 3 | 20,369 (10.1) | 53,643 (9.1) |
| >3 | 132,310 (80.6) | 307,827 (52.4) |
| Median Number of Clinical Encounters | 9 | 4 |
| Interquartile Range (IQR) of Clinical Encounters | 4–18 | 2–10 |
| Selected Recorded Conditions | ||
| Number of Conditions | Number of patients (%) | Number of patients (%) |
|
0 1 |
65,357 (46.3) 32,425 (23.0) |
160,847 (55.4) 74,615 (23.8) |
| 2 | 18,025 (12.8) | 32,789 (10.5) |
| >=3 | 25,277 (17.9) | 44,797 (14.3) |
| Total | 141,084 | 313,048 |
Clinical encounter characteristics
An analysis of visit frequency showed patterns in general practice attendance between active patients and unique patients (Table 2). Patients classified as ‘active’ at the practice level met the ≥ 3 encounter threshold at their primary practice of attendance, though patient-level analysis revealed that these individuals often distributed their care across multiple practices, with only 2.5% of active patients receiving all their care at a single practice. Active patients had a higher median number of clinical encounters (9; interquartile range [IQR]: 4–18) compared to unique patients (median: 4; IQR: 2–10). The active cohort also exhibited a broader spread of visit frequencies, with an upper quartile extending to 18 encounters, reflecting a higher concentration of frequent attenders.
In contrast, a large proportion of patients in the practice-level cohort had minimal engagement with general practice services, 22.4% of patients had only one encounter, and 16.1% had two, compared with 2.5% and 6.8%, respectively, in the active cohort. These discrepancies are expected, as the active group was defined using a visit frequency threshold of three clinical encounters in 2 years, thereby excluding lower-contact patients who are otherwise captured in the patient-level cohort.
Recorded condition characteristics
The number of recorded chronic conditions, limited to the selected condition groupings defined in Table 1, varied across cohorts (Table 2). A greater proportion of unique patients had no recorded conditions (55.4%) compared to active patients (46.3%). In contrast, active patients were more likely to have one or more conditions recorded, including 17.9% with three or more conditions, compared to 14.3% in the patient-level cohort. These differences align with visit frequency trends, as patients in the active practice-level cohort had more recorded encounters on average than the patient-level group.
Prevalence of selected chronic conditions across patient cohorts
Table 3 presents the prevalence of selected chronic conditions across the two cohorts: active patients and unique patients for the subset of disease groupings defined in Table 1. Across all conditions, the highest prevalence estimates were consistently observed in the active cohort, which excluded patients with minimal general practice contact. For example, hypertension prevalence was 18.2% in active patients, compared with 11.6% in the unique cohort. A similar pattern was observed for lipid disorders (16.8% in active vs. 9.4% in the unique patient group), and asthma (11.3% vs. 8.1%).
Table 3.
Chronic condition prevalence across general practice patient cohorts
| Conditions | Practice-Level Patients (N = 201,817) | Patient-Level (Unique) (N = 608,000) |
||
|---|---|---|---|---|
| Total Diagnosed Conditions for Active Patients | Prevalence (%) | Total Diagnosed Conditions for Linked Patients | Prevalence (%) | |
| CHD | 7,517 | 3.72 | 15,328 | 2.52 |
| HF | 2,067 | 1.02 | 4,915 | 0.81 |
| Stroke | 1,944 | 0.96 | 4,425 | 0.73 |
| Other CVD | 6,515 | 3.23 | 13,288 | 2.19 |
| Lipid Disorders | 33,863 | 16.78 | 57,255 | 9.42 |
| Hypertension | 36,622 | 18.15 | 70,363 | 11.57 |
| Asthma | 22,881 | 11.34 | 49,211 | 8.09 |
| COPD | 5,477 | 2.71 | 10,336 | 1.7 |
| Diabetes | 14,048 | 6.96 | 27,999 | 4.61 |
| CKD (all stages) | 2,301 | 1.14 | 3,903 | 0.64 |
| Total | 133,235 | 257,023 | ||
CVD = cardiovascular disease; CHD = coronary heart disease; HF = heart failure; COPD = chronic obstructive pulmonary disease; CKD = chronic kidney disease
Prevalence estimates were lowest in the unique patient cohort across every condition. The number of recorded conditions mirrored these prevalence patterns. The patient-level cohort captured more total diagnoses due to its larger denominator, (257,023 total diagnosed conditions across the N = 608,000 patient-level cohort, versus 133,235 total diagnosed conditions across N = 201,817 practice-level active patients), however prevalence rates were consistently higher in the smaller practice-level active cohort across all examined conditions.
These findings are consistent with the observed differences in recorded condition counts from Table 2, where active patients had higher median visit counts and more recorded encounters. In contrast, the unique patient cohort included a larger proportion of individuals with only one or two clinical encounters.
Discussion
This is the first study to directly compare chronic condition prevalence estimates derived from practice-level active patient records with patient-level linked general practice data Our findings show that prevalence estimates were consistently higher in the active practice-level patient cohort compared to the patient-level cohort, with differences ranging from 0.2% to 7.4% across conditions. Through the use of general practice data and linking records across multiple practices, it is possible to obtain a more complete patient record, minimise duplication, and provide prevalence estimates that more accurately reflect the patient population.
National prevalence estimates require access to large nationwide datasets. In Australia, these include data from the Australian Bureau of Statistics, Australian Institute of Health and Welfare, the Pharmaceutical Benefits Scheme (PBS), the Medicare Benefits Schedule (MBS), and General practice-derived datasets, such as MedicineInsight [34]. However, these data sources differ in structure, data capture methods, and intended use, limiting direct comparability [28, 48]. While general practice data is a valuable resource, representing information for approximately 90% of Australians who visit a general practice annually [1], it is an underutilised source for epidemiological research, presenting well-documented challenges related to data completeness, representativeness, and standardisation [11, 12, 20, 28, 35]. Although the quality of general practice data remains variable, initiatives such as the Practice Incentives Program (PIP) Quality Improvement (QI) Incentive [49, 50] offer opportunities for promoting systematic recording across Australian general practices.
Due to challenges associated with obtaining linked data from general practices [51–54], previous studies have commonly relied on active patients [37] to mitigate double-counting of individuals who attend more than one practice to establish a stable patient denominator for prevalence estimates within individual practices [38]. The use of the active patient designation is valuable for understanding a practice’s workload, the health needs of the immediate community, and for mandatory accreditation [37]. However, this criterion inherently leads to prevalence estimates that are more representative of the sickest or most engaged segment of the population, potentially inflating rates by excluding healthier individuals who attend less frequently. Our findings support this, as the larger unique patient cohort, which includes both healthy and frequent attenders, consistently resulted in lower estimates of multimorbidity compared to the active patient group.
This observed lower multimorbidity within the unique patient cohort, despite their generally lower encounter volumes, supports the observed bias in prevalence derived from active patients. This discrepancy may be attributable to the fragmented nature of general practice data [55, 56] where a patient’s complete diagnostic profile is distributed across multiple practices, leading to an over-representation of disease burden among frequent attendees. Therefore, solely relying on active patients in the absence of data linkage should be considered carefully when selecting this as a proxy for the total population at risk.
Applying linkage methodologies to general practice data has the potential to improve the representativeness of prevalence estimates by providing a more inclusive and less biased denominator. The higher prevalence estimates observed in the active patient group compared to the unique cohort suggest that relying only on active patient status can result in overestimates in disease prevalence, reinforcing the consideration for linked, population-wide datasets to improve these estimates. Although minor discrepancies in prevalence were observed for some conditions, the impact on public health policy can have consequential ramifications that can directly influence critical decisions related to funding, resource allocation and clinical guideline development [39, 41]. Our findings suggest that a caveat may need to be applied to previous estimates derived exclusively from active patient populations, simultaneously highlighting the importance of understanding the nuances of administrative data.
Limitations
This study contributes to a better understanding of the variation in condition recording across practices, but it has several limitations that should be acknowledged. Firstly, our dataset includes data from only 39 practices of approximately 700 [57] across Western Australia, limiting the generalisability of our findings to the broader population. This limited scope also prevented us from tracking patient movement between practices comprehensively, which could provide further insights into the continuity of care and the role of multiple practice attendance in condition recording. Even though our dataset included records of patients who visited only a single practice, these individuals represent just 6% of the total adult population in Western Australia and were therefore unable to confirm whether these patients exclusively sought care at a single practice or if additional clinical encounters occurred outside of this practice group. While our analysis suggests that active cohorts overestimate disease prevalence, linkage errors could potentially have affected our findings. False negative linkage errors (missed matches) could result in patients being incorrectly classified as visiting only one practice when in fact they visited multiple practices, potentially leading to underestimation of disease prevalence. Conversely, false positive linkage errors (incorrect matches) could artificially inflate the linked cohort by merging records from different patients. However, given that the error rates of the PPRL linkage methodology used in this study are extremely low [43–45], we expect these errors to be minimal compared to the systematic biases present in active cohort analyses. Additionally, the analysis assumes that all recorded conditions are accurate and up-to-date; however, inconsistencies in electronic medical records, such as outdated diagnoses or coding errors, could affect our results. Lastly, our findings focus solely on the presence or absence of condition recordings and do not evaluate the accuracy or completeness of the diagnoses captured within the patient record.
Conclusions
This study highlights the opportunity for data linkage to be used in generating improved estimates of chronic disease prevalence from general practice records. By consolidating patient information across practices, linked datasets mitigate duplication, reduce fragmentation, and capture a broader spectrum of healthcare utilisation. The observed differences in prevalence rates between active patients and the unique patient cohort reveal the limitations of the sole reliance on active patients for calculating disease prevalence. These findings reinforce the value of investing in high-quality, linked general practice data infrastructure to support reliable epidemiological monitoring and inform evidence-based policy and resource planning in primary care.
Acknowledgements
The authors thank the participating general practices for contributing primary healthcare records to MedicineInsight, MedicineInsight’s staff for preparing the dataset, as well as Curtin’s Centre for Data Linkage for applying their data linkage methodologies to the dataset.
Author contributions
RV, CL, JB, RV, SRandall, and SRobinson were involved in the conception, design, and interpretation of the results. RV acquired the data, conducted the analyses, and wrote the first draft of the manuscript. All authors edited, reviewed, and approved the final version of the manuscript. SRobinson JB, SRandall, and CL are RV’s supervisors.
Funding
This project was supported by the Western Australian Health Translation Network and the Australian Government’s Medical Research Future Fund (MRFF) as part of the Rapid Applied Research Translation program.
Data availability
Restrictions apply to the availability of these data, which were used under license for the current project and are not publicly available. For access to the MedicineInsight data, contact the Australian Commission on the Safety and Quality in Health Care [MedicineInsight@safetyandquality.gov.au].
Declarations
Ethics and consent to participate
All methods were carried out in accordance with relevant guidelines and regulations. Ethics approval was obtained from the Curtin University Human Research Ethics Committee (HRE2019-0619). A waiver of consent was granted, with no identifying information being available to the researchers. Access to the MedicineInsight data was obtained via a data access application (Project 2020-033).
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.The Royal Australian College of General Practitioners. General practice: health of the Nation 2024. East Melbourne, Vic: The Royal Australian College of General Practitioners Ltd; 2024. [Google Scholar]
- 2.Tran M, Rhee J, Blazek K, Balasooriya C, Vuong K. Digital health technology use in Australian general practice (GP) consultations: a cross-sectional analysis of the medicine in australia: balancing employment and life study. Prim Health Care Res Dev. 2025;26:e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Khanna S, Rolls DA, Boyle J, Xie Y, Jayasena R, Hibbert M, et al. A risk stratification tool for hospitalisation in Australia using primary care data. Sci Rep. 2019;9(1):5011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hosar R, Steinsbekk A. Identifying individuals with complex and long-term health-care needs using the Johns Hopkins adjusted clinical groups system: A comparison of data from primary and specialist health care. Scand J Public Health. 2024;52(5):607–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Singh H, Graber M. Reducing diagnostic error through medical home-based primary care reform. JAMA. 2010;304(4):463–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Takada T, Heus P, van Doorn S, Naaktgeboren CA, Weenink JW, van Dulmen SA, et al. Strategies to reduce the use of low-value medical tests in primary care: a systematic review. Br J Gen Pract. 2020;70(701):e858–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wallis KA, Elley CR, Moyes SA, Lee A, Hikaka JF, Kerse NM. Safer prescribing and care for the elderly (SPACE): a cluster randomised controlled trial in general practice. BJGP Open. 2022;6(1). [DOI] [PMC free article] [PubMed]
- 8.Youens D, Moorin R, Harrison A, Varhol R, Robinson S, Brooks C, et al. Using general practice clinical information system data for research: the case in Australia. Int J Popul Data Sci. 2020;5(1):12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Liaw ST. Clinical decision support systems: data quality management and governance. Stud Health Technol Inf. 2013;193:362–9. [PubMed] [Google Scholar]
- 10.Liaw ST, Taggart J, Yu H, de Lusignan S. Data extraction from electronic health records - existing tools May be unreliable and potentially unsafe. Aust Fam Physician. 2013;42(11):820–3. [PubMed] [Google Scholar]
- 11.Cheah R, Canaway R, Hallinan CM, de Mendonca L, Manski-Nankervis JA. Using primary care data for research: what are the issues and potential solutions? Aust J Gen Pract. 2024;53(6):408–11. [DOI] [PubMed] [Google Scholar]
- 12.Canaway R, Chidgey C, Hallinan CM, Capurro D, Boyle DI. Undercounting diagnoses in Australian general practice: a data quality study with implications for population health reporting. BMC Med Inf Decis Mak. 2024;24(1):155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gajjar D, Stiebahl S, Powell T. General practice in England. London: House of Commons Library; 2025 07 March 2025. Contract No.: Number CBP07194.
- 14.The Royal Australian College of General Practitioners. General practice: heatlh of the nation 2018- an annual insight into the state of general practice. The Royal Australian College of General Practitioners Ltd 100 Wellington Parade East Melbourne, Victoria 3002: Royal Australian College of General Pracititoners (RACGP). 2018 September 2018.
- 15.Wright M, Hall J, van Gool K, Haas M. How common is multiple general practice attendance in australia? Aust J Gen Pract. 2018;47(5):289–96. [DOI] [PubMed] [Google Scholar]
- 16.Dhalwani NN, Tata LJ, Coleman T, Fiaschi L, Szatkowski L. A comparison of UK primary care data with other National data sources for monitoring the prevalence of smoking during pregnancy. J Public Health (Oxf). 2015;37(3):547–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Tate AR, Dungey S, Glew S, Beloff N, Williams R, Williams T. Quality of recording of diabetes in the UK: how does the gp’s method of coding clinical data affect incidence estimates? Cross-sectional study using the CPRD database. BMJ Open. 2017;7(1):e012905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Orueta JF, Nuno-Solinis R, Mateos M, Vergara I, Grandes G, Esnaola S. Monitoring the prevalence of chronic conditions: which data should we use? BMC Health Serv Res. 2012;12:365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang S, Lau YS, Sutton M, Anderson M, Kypridemos C, Head A, et al. Inequalities in the prevalence recording of 205 chronic conditions recorded in primary and secondary care for 12 million patients in the english National health service. BMC Med. 2024;22(1):570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Canaway R, Boyle DI, Manski-Nankervis JE, Bell J, Hocking JS, Clarke K, et al. Gathering data for decisions: best practice use of primary care electronic records for research. Med J Aust. 2019;210(Suppl 6):S12–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Primary Health Care Advisory Group. Better outcomes for people with chronic and complex health conditions. Canberra: Australian Department of Health; 2015. [Google Scholar]
- 22.National Health Performance Authority. Healthy communities: frequent GP attenders and their use of health services in 2012–13, technical supplement. Sydney, Australia: National Health Performance Authority; 2015. [Google Scholar]
- 23.OECD. Caring for quality in health: lessons learnt from 15 reviews of health care quality. Paris; 2017.
- 24.Mehta SZJ, Poppe K, Kerr AJ, Wells S, Exeter DJ, Selak V, Grey C, Jackson R. Cardiovascular preventive pharmacotherapy stratified by predicted cardiovascular risk: a National data linkage study. Eur J Prev Cardiol. 2021. [DOI] [PubMed]
- 25.Wells S, Poppe K, Selak V, Kerr A, Pylypchuk R, Wu B, et al. Is general practice identification of prior cardiovascular disease at the time of CVD risk assessment accurate and does it matter? N Z Med J. 2018;131(1475):10–20. [PubMed] [Google Scholar]
- 26.Institute for Evidence-Based Healthcare BU, Australia. Evidence synthesis to support the development of guidelines for absolute cardiovascular disease risk gold Coast. Queensland: Bond University; 2021. [Google Scholar]
- 27.Foundation H, Aus. CVD Risk - how the calculator was developed 2022 [cited 2024 July 15]. Available from: https://www.cvdcheck.org.au/how-the-calculator-was-developed.
- 28.Harrison C, Henderson J, Miller G, Britt H. The prevalence of diagnosed chronic conditions and Multimorbidity in Australia: a method for estimating population prevalence from general practice patient encounter data. PLoS ONE. 2017;12(3). [DOI] [PMC free article] [PubMed]
- 29.Australian Bureau of Statistics. Person level integrated data asset (PLIDA): ABS. 2024. Available from: https://www.abs.gov.au/about/data-services/data-integration/integrated-data/person-level-integrated-data-asset-plida.
- 30.National Prescribing Service. MedicineInsight data book version 4.0 Sydney. NPS MedicineWise; 2021. December 2021.
- 31.Busingye D, Gianacas C, Pollack A, Chidwick K, Merrifield A, Norman S, et al. Data resource profile: MedicineInsight, an Australian National primary health care database. Int J Epidemiol. 2019;48(6):1741–h. [DOI] [PubMed] [Google Scholar]
- 32.Australian Commission on Safety and Quality in Health Care. MedicineInsight, Canberra ACSQHC. 2025 [cited 2025 08 July]. Available from: https://www.safetyandquality.gov.au/our-work/indicators-measurement-and-reporting/medicineinsight.
- 33.Mnatzaganian G, Lee CMY, Cowen G, Boyd JH, Varhol RJ, Randall S, et al. Sex disparities in the prevalence, incidence, and management of diabetes mellitus: an Australian retrospective primary healthcare study involving 668,891 individuals. BMC Med. 2024;22(1):475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Jun M, Wick J, Neuen BL, Kotwal S, Badve SV, Woodward M, et al. The prevalence of CKD in Australian primary care: analysis of a National general practice dataset. Kidney Int Rep. 2024;9(2):312–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Gordon J, Britt H, Miller GC, Henderson J, Scott A, Harrison C. General practice statistics in australia: pushing a round peg into a square hole. Int J Environ Res Public Health. 2022;19(4). [DOI] [PMC free article] [PubMed]
- 36.Gordon J, Miller G, Britt H. Reality check - reliable National data from general practice electronic health records. Canberra: Deeble Institute for Health Policy Research; 2016. [Google Scholar]
- 37.The Royal Australian College of General Practitioners. Standards for general practices. 5th ed. East Melbourne, Victora: RACGP; 2023. p. 270. [Google Scholar]
- 38.Jones JL, Lumsden NG, Simons K, Ta’eed A, de Courten MP, Wijeratne T et al. Using electronic medical record data to assess chronic kidney disease, type 2 diabetes and cardiovascular disease testing, recognition and management as documented in Australian general practice: a cross-sectional analysis. Fam Med Community Health. 2022;10(1). [DOI] [PMC free article] [PubMed]
- 39.Institute of Medicine (US). Committee on assuring the health of the public in the 21st century. The future of the public’s health in the 21st century. Washington (DC): National Academies Press (US); 2002. [PubMed] [Google Scholar]
- 40.Gu Y, Wang A, Tang H, Wang H, Jiang Y, Jin C, et al. Comparison of rare and common diseases in the setting of healthcare priorities: evidence of social preferences based on a systematic review. Patient Prefer Adherence. 2023;17:1783–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Nguengang Wakap S, Lambert DM, Olry A, Rodwell C, Gueydan C, Lanneau V, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28(2):165–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Australian Government Productivity Commission. Report on Government Services 2021, Part E. Section 10, Table 10A.53: Australian Government Productivity Commission.; 2021. Available from: https://www.pc.gov.au/research/ongoing/report-on-government-services/2021/health/primary-and-community-health.
- 43.Lim D, Randall S, Robinson S, Thomas E, Williamson J, Chakera A, et al. Unlocking potential within health systems using Privacy-Preserving record linkage: exploring chronic kidney disease outcomes through linked data modelling. Appl Clin Inf. 2022;13(4):901–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Randall S, Wichmann H, Brown A, Boyd J, Eitelhuber T, Merchant A, et al. A blinded evaluation of privacy preserving record linkage with bloom filters. BMC Med Res Methodol. 2022;22(1):22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Brown AP, Ferrante AM, Randall SM, Boyd JH, Semmens JB. Ensuring privacy when integrating Patient-Based datasets: new methods and developments in record linkage. Front Public Health. 2017;5:34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Python Software Foundation. Python language reference, version 3.5. Available at: http://www.python.org.2022.
- 47.Kluyver T, Ragan-Kelley B, Perez F, Granger B, Bussonnier M, Frederic J, et al. Positioning and power in academic publishing: players agents and Agendas, Jupyter Notebooks—A publishing format for reproducible computational workflows. Netherlands: IOS; 2016. [Google Scholar]
- 48.Helena Britt, Graeme C, Miller J, Henderson C, Bayram C, Harrison L, Valenti, et al. General practice activity in Australia 2015–16. Sydney: University of Sydney; 2016. [Google Scholar]
- 49.Australian Institute of Health and Welfare (AIHW). Practice Incentives program quality improvement measures: national report on the first year of data 2020-21 Canberra: AIHW. 2021. Available from: https://www.aihw.gov.au/reports/primary-health-care/pipqi-measures-national-report-2020-21/contents/about.
- 50.Australian Department of Health. PIP QI incentive guidance: Australian department of health. 2019. Available from: https://www1.health.gov.au/internet/main/publishing.nsf/Content/PIP-QI_Incentive_guidance.
- 51.Varhol RJ, Robinson S, Man Ying Lee C, Randall S. JH. B. Attitudes towards sharing data: a data custodian perspective. In Review. 2024.
- 52.Lugg-Widger FV, Angel L, Cannings-John R, Hood K, Hughes K, Moody G, et al. Challenges in accessing routinely collected data from multiple providers in the UK for primary studies: managing the morass. Int J Popul Data Sci. 2018;3(3):432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Harron K. Data linkage in medical research. BMJ Med. 2022;1(1):e000087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Harron K, Dibben C, Boyd J, Hjern A, Azimaee M, Barreto ML, et al. Challenges in administrative data linkage for research. Big Data Soc. 2017;4(2):2053951717745678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Marjot J, Haysom G, Browne P. Medico-legal risks associated with fragmented care in general practice. Med J Aust. 2021;215(5):203–e51. [DOI] [PubMed] [Google Scholar]
- 56.Richard J, Varhol Z, Haywood S, Randall JH, Boyd S, Robinson. Crystal man Ying Lee. Challenges and opportunities in using primary care electronic health record data for research: a scoping review. Under Rev. 2025.
- 57.WA Primary Health Alliance (WPHA). GPs urged to boost community health via grants program 2023. Available from: https://news.wapha.org.au/gps-urged-to-boost-community-health-via-grants-program/.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Restrictions apply to the availability of these data, which were used under license for the current project and are not publicly available. For access to the MedicineInsight data, contact the Australian Commission on the Safety and Quality in Health Care [MedicineInsight@safetyandquality.gov.au].
