Skip to main content
BMJ Open logoLink to BMJ Open
. 2024 Oct 17;14(10):e088166. doi: 10.1136/bmjopen-2024-088166

Akrivia Health Database—deep patient characterisation using a secondary mental healthcare dataset in England and Wales: cohort profile

Ana Todorovic 1,, Philip Craig 1, Simon Pillinger 1, Panagiota Kontari 1, Sophie Gibbons 1, Luke Bryden 1, Tarso Franarin 1, Ceyda Uysal 1, Gloria Roque 1, Benjamin Fell 1
PMCID: PMC11488076  PMID: 39419624

Abstract

Abstract

Purpose

The Akrivia Health cohort was created to extract data from electronic health records in secondary mental health and dementia care services in England and Wales. The data are anonymised, structured and harmonised from the source electronic health records across a range of information technology systems, enabling for unified, privacy-preserving access for research purposes.

Participants

The cohort contains data from electronic health records for over 4.6 million patients in England and Wales, as of January 2024. The data are refreshed with regularity, and the dataset expands whenever a new healthcare provider joins the Akrivia network. 13% of the database are patients under 18 years old (n=590 160), 56% are adults 18–65 years old (n=2 631 690) and 31% are older people (n=1 422 609). About 11.5% are deceased (n=538 371).

Findings to date

Structured data include patient demographics and service pathways. Akrivia Health also uses a bespoke natural language processing model to further extract the research-relevant information from free-text progress notes, including diagnoses, medications and clinical symptoms. This allows for an in-depth longitudinal description of patient journeys.

Future plans

The anonymised data can be accessed in collaboration with Akrivia Health, following the National Health Service guidelines and without requiring a separate ethics application. There is no planned end date for data collection.

Keywords: Electronic Health Records, MENTAL HEALTH, Natural Language Processing, PSYCHIATRY


STRENGTHS AND LIMITATIONS OF THIS STUDY

  • The cohort includes real-world longitudinal data from mental health and dementia care services in England and Wales, from over 4.6 million patients.

  • The data are harmonised across different electronic health record systems, and enriched using natural language processing (NLP) to mine information from clinical free-text fields as well as linkages with, for example, census data on neighbourhood deprivation.

  • The main limitation of NLP is that it is probabilistic, sometimes introducing errors (false positives and false negatives) in the concepts being extracted.

  • In addition, it is not possible to say how representative the cohort is of the entire population of mental health service users in England and Wales.

Introduction

In the UK, the National Health Service (NHS) provides care for all residents, free at the point of need. As a consequence, nearly all of the UK population is registered with general practitioners (primary care),1 who refer patients to specialist mental health and dementia services (secondary care) in accordance with National Institute for Health and Care Excellence guidelines. Referrals can also be made by the patients themselves, or their friends and family. This broad national coverage means that the information collected by the NHS could provide unique insights into the real-world mental health of the general population. Importantly, the UK also has a tradition of collecting data using electronic health records, a practice that was instated in 2005. Some services also converted their older paper records to electronic. This allows, in principle, for straightforward access to rich, valuable and longitudinal healthcare data from all patients.

In practice, however, data stored in electronic health records are rarely available in a research-ready format,2 and electronic healthcare systems are principally purposed as medicolegal archives, with features most relevant to administrative purposes and not research. In NHS secondary mental healthcare services, relatively little information is recorded in a structured (ie, prespecified) way, such as choosing a diagnosis from a drop-down menu. The vast majority of clinically relevant data are recorded in the form of free text, in progress notes, attachments or fields associated with referrals, written in natural language. Care decisions in psychiatry are influenced by a wide range of biological, psychological and social factors, and as a result free text often includes correspondingly rich, holistic descriptions of patient states, family histories, adverse life events, relational trauma, struggles with relationships or employment and similar. In other words, progress notes are most likely to centre around the information that clinicians deem particularly relevant to their patients.

Gaining systematic and reliable access to these rich and comprehensive longitudinal free-text data is challenging. Akrivia Health is dedicated to collecting and curating these data. The data from secondary mental health services are collated and deidentified, anonymised, harmonised, structured, processed using natural language processing (NLP) and structured query language (SQL), and uploaded in a research-ready format to a secure online platform, Akrivia Clinical Record Interactive Search (CRIS).

Cohort description

The Akrivia Health cohort includes longitudinal data on over 4.6 million patients in secondary mental healthcare services within the NHS in England and Wales. The oldest patient records in the database were registered prior to 1990, for 343 patients in total. Since 1990, the database has had a steady increase in patient records registered each year, with over 100 000 new patients per year since 2005 (figure 1A). However, the Akrivia Health cohort is not the first database of its kind in England. Originally, a cohort was developed in 2008 as The South London and Maudsley NHS Foundation Trust Biomedical Research Centre Case Register and its CRIS application, and it continues to exist as a separate database, extensively used in research.3 4 A related project was then initiated at the University of Oxford, resting on similar data extraction principles, with the difference that federated access to different NHS healthcare organisations (or trusts) was instated (these were known as UK-CRIS and Dementia (D)-CRIS). That project, initially funded by the National Institute for Health and Care Research (NIHR) and the Medical Research Council (MRC), has spun out in 2019 as a commercial data curation and research company, Akrivia Health. The Akrivia cohort has since grown to include 14 NHS healthcare organisations, with a further six currently onboarding and soon to be added to the database. Once added, this will equal 38% of the total 53 NHS organisations in England and Wales that offer secondary care for mental health and dementias. The database is regularly updated against the source files, with no planned stopping point to data collection and without gaps in data acquisition. The data are refreshed fortnightly wherever possible, with a median refresh frequency of 6 weeks and a maximum lag of 11 months.

Figure 1. (A) New patients registered over time. Each point represents the average number of patients per year who joined over a 5-year period, for example, the first dot includes the number of newly registered patients from 1995 to 1999 (inclusive), divided by five. (B) Percentage of English population who are Akrivia patients, per decile of the Index of Multiple Deprivation (IMD) rankings, from most to least deprived neighbourhoods. (C) Spatial distribution of patients with an LSOA code in the Akrivia dataset. Patient locations mapped to a 5×5 km grid using the Ordnance Survey map coordinates. LSOA, lower layer super output area.

Figure 1

In January 2024, the Akrivia Health cohort had 4 648 248 patients in total, of which 4 109 877 were not recorded as being deceased. Overall, the database covers demographic information, diagnoses, medication, referrals to care teams, hospitalisations, clinical signs and symptoms, elements of treatment pathways, life circumstances, health scores on questionnaires, etc. Demographic patient details can be seen in table 1, and other variables in table 2. Please see the online supplemental file 1 for code examples to create the tables.

Table 1. Demographic characteristics of the Akrivia cohort (January 2024 dataset build).

Characteristics Counts or examples
Age in January 2024
 Age under 18 590 160 (13%)
 Age 19–65 2 631 690 (56%)
 Age over 65 1 422 609 (31%)
Gender
 Male 2 121 704 (45, 7%)
 Female 2 522 088 (54, 2%)
 Other/non-binary 3465 (0, 07%)
 Not known 1057 (0, 03%)
Ethnicity
 Asian 141 945 (3%)
 Black 79 818 (1, 7%)
 White 2 736 548 (60%)
 Mixed 79 249 (1, 7%)
 Other 74 463 (1, 6%)
 Not known 1 536 291 (33%)
Marital status
 Single 903 416 (19%)
 Cohabitating 21 424 (0, 5%)
 Married/civil partnership 527 901 (11, 5%)
 Separated 47 151 (1%)
 Divorced 80 565 (2%)
 Widowed 196 678 (4%)
 Not known 2 871 179 (62%)
Mortality status
 Living 4 109 877 (88, 5%)
 Deceased 538 371 (11, 5%)

Table 2. Akrivia cohort data dictionary—list of readily available variables.

Characteristics Counts or examples
Date of registration
  • 1990–2000 (end of 1999)

  • 2000–2010 (end of 2009)

  • 2010–2020 (end of 2019)

  • 2020+

 112 600 (2,5%) 1 001 563 (21.5%) 2 555 946 (56%)945 351 (20%)
Diagnosis (structured)
  • International Classification of Diseases (ICD)-10 code

  • ICD-11 code

  • Diagnosis description

  • Diagnosis recorded date

    Diagnosis end date

  • Duration in days

760 272 patients (or 16,4%) have one.Top three structured diagnoses given to women:dementia in Alzheimer’s disease, atypical or mixed (F02), mild cognitive disorder (F067), emotionally unstable personality disorder (F603)Top three structured diagnoses given to men:paranoid schizophrenia (F200), mental and behavioural disorders due to the use of tobacco (F171), personal history of self-harm (Z915)
Diagnosis (NLP)
  • Progress note date

  • ICD-10 code

  • Diagnosis description

  • Experiencer (patient or other)

  • Status (has, had, could have, does not have)

2 031 939 patients (43,7%) have one.Top three NLP diagnoses given to women: unspecified dementia (F03), autistic disorder (F840), unspecified psychosis (F29)Top three NLP diagnoses given to men: unspecified dementia (F03), autistic disorder (F840), attention-deficit hyperactivity disorder (F90)
Medication (NLP)
  • Progress note date

  • Drug name

  • Status (is on, was on, other)

  • Dosage amount and unit

1 810 887 patients (39%)Top 10 (ranked by how many patients were prescribed this medication):sertraline, mirtazapine, paracetamol, citalopram, diazepam, zopiclone, lorazepam, quetiapine, fluoxetine, olanzapine
Inpatient stays
  • Admission date

  • Discharge date

  • Admission duration

  • Admission source

  • Admission method

  • Discharge destination

  • Discharge method

254 410 patients (5.5%)Example:22 August 200125 August 2001Three daysNHS general hospital wardEmergencyUsual place of residenceDischarged on clinical advice
Referrals
  • Referred to

  • Referral date

  • Referral received

  • Referral accepted

  • Discharge date

3 537 615 patients (76%) have referrals dataExamples: memory clinic, forensic liaison service, low security inpatient service, personality disorder team, continuing care
Health scores (questionnaires)
  • Addenbrooke’s Cognitive Examination (ACE)

  • Beck’s Depression Inventory (BDI)

  • Clinical Outcomes in Routine Evaluation (CORE-10)

  • Generalised Anxiety Disorder (GAD-7)

  • Hamilton Depression Rating Scale (HAMD)

  • Health of the Nation Outcome Scales (HoNOS)

  • Montgomery-Asberg Depression Rating Scale (MADRS)

  • Mini Mental State Examination (MMSE)

  • Montreal Cognitive Assessment (MOCA)

  • Positive and Negative Syndrome Scale (PANSS)

  • Patient Health Questionnaire (PHQ-9)

  • Quick Inventory of Depressive Symptomatology (QIDS)

  • Quality of Life Scale (QOLS)

  • Work and Social Adjustment Scale (WSAS)

NLP-derived, total score only289 092 patients (6%) have at least one
  • ACE: 65 464 (1, 5%)

  • BDI: 3566 (0, 07%)

  • CORE-10: 8271 (0, 17%)

  • GAD-7: 65 193 (1, 4%)

  • HAMD: 395 (<0, 01%)

  • HoNOS: 6413 (1, 38%)

  • MADRS: 930 (0, 02%)

  • MMSE: 121 484 (2, 61%)

  • MOCA: 56 236 (1, 2%)

  • PANSS: 260 (<0, 01%)

  • PHQ-9: 31 707 (0, 7%)

  • QIDS: 202 (<0, 01%)

  • QOLS: 10 (<0, 01%)

  • WSAS: 6636 (0, 14%)

Structured scores with individual questions:
  • GAD-7: 5733 (0, 1%)

  • PHQ-9: 15 590 (0, 3%)

  • HoNOS: 386 309 (7, 6%)

Index of multiple deprivationNeighbourhood deprivation in income, employment, education, skills and training, health and disability, crime, barriers to housing and services 4 018 223 patients (86, 5%)
Electroconvulsive (ECT) therapy (NLP)Mention of ECT in progress notes 49 765 patients (1%)
Psychotherapy (NLP)Mention of psychotherapy in progress notes  1 229 613 patients (26%)
Anhedonia (NLP)Mention of anhedonia in progress notes 331 802 patients (7%)
Employment issues (NLP)Mention of employment insecurity in progress notes 251 308 patients (5, 5%)
Signs and symptoms (NLP)Contextually classified traits and behaviours associated with mental health problems
  • Status (has, had, does not have, other)

2 629 776 patients (57%)10 most common signs and symptoms:anxious, low mood, mention of mood, worried, happy, settled, low in mood, mention of appetite, upset, angry
Substance use (NLP)Contextually classified substances
  • Status (is using, was using, is not using, other)

1 344 252 (29%) patientsSubstance types in order of frequency:alcohol, nicotine, other, cannabis, cocaine, opiates
Accommodation issues (NLP)Mention of homelessness in progress notes 33 220 patients (0, 7%)
UK Biobank flagPatient exists in UK Biobank 18 033 patients (0, 4%)

NHSNational Health ServiceNLPnatural language processing

About 53% of the patients do not have a mental health or dementia diagnosis recorded in the structured portion of their electronic health records or written in their progress notes, although it is possible that diagnosis information exists elsewhere in their electronic health records, for example, in attachments or referrals. Among the 47% whose diagnoses we have located (over 2 million patients), the average patient has 4.38 different diagnostic labels with an SD of 5.52. These numbers include 29% patients who have only one diagnosis with no comorbidities. An overview of common diagnosis types and patient numbers can be seen in table 3.

Table 3. Common diagnostic categories in the Akrivia cohort (January 2024 dataset build).

Diagnostic category ICD-10 codes Total number of patients
Major depressive disorder F32, F33, F53.0 296 229
Psychotic disorders F2, F10.5, F11.5, F12.5, F13.5, F14.5, F15.5, F16.5, F17.5, F18.5, F19.5, F06.0, F06.1, F06.2 246 298
Schizophrenia F20 110 005
Anxiety disorders F40, F41, F42, F43.1, F45.2 285 494
Bipolar disorder F30, F31, F25 154 533
Eating disorders F50 61 874
Personality disorders F60, F61, F62, F69 152 361
Developmental disorders F80, F81, F82, F83, F84, F88, F89, F90 348 409
Dementias F00, F01, F02, F03, G30, G31.0, G31.83 428 841
Alzheimer’s disease F00, G30 212 061
Mild cognitive disorder F06.7, G31.84 72 798
Treatment-resistant depression F32, F33, F53.0 + tried more than two antidepressants during a single depressive episode (from referral to discharge) 187 451

Information governance

Akrivia Health’s governance model is built to incorporate data protection by design, ensuring that data can be used for research in a commercially sustainable model while protecting the privacy of individuals.5 Research-relevant elements of the electronic health records are deidentified (ie, pseudonymised), with Akrivia Health working with healthcare organisations as their data processor (UK GDPR, article 4 (8)). Deidentification (article 4 (5)) significantly reduces the identifiability of individuals but does not, in a European Union and UK legal context, render the data anonymous (recital 26). We further anonymise the pseudonymised data in a non-reversable format, thereby removing it from the material scope of the UK General Data Protection Regulation (GDPR), which only applies to personal data (article 2 (1)). The anonymised version of the dataset is controlled by Akrivia Health and may be used for commercial activity in line with the Information Standards Board for Health and Social Care (ISB)1523.6 Partner healthcare organisations determine precisely which data is extracted from their electronic health records, and organisations can and do remove patients who have opted out of their data being used beyond direct care.7

Data harmonisation

The data harmonisation process includes pooling together and standardising data from diverse sources. Different healthcare organisations within the NHS record information using different electronic systems, such as Rio, CareNotes, Paris, CareDirector or SystmOne. This means that the same information might be coded differently by different organisations.

As an example, some healthcare organisations use national codes for ethnicity categories defined in the NHS Data Dictionary8 based on the results of the 2001 census. However, healthcare organisations do not consistently use the national codes9 10 and each one may have their own internal codes for ethnicity that are mapped to the national codes, or they might use an alternative version of the national codes.11 Akrivia’s approach to ethnicity harmonisation is to align with the national categories, that is, mapping all ethnicity descriptions to five broad categories across all healthcare organisations. While this means that fine-grained information recorded in some organisations is lost, harmonisation allows for combining data in greater numbers as the same query can be run across all trusts in our network simultaneously.

Natural language processing

A key challenge for researching NHS psychiatric electronic health records is that most of the clinically relevant information is present in free-text fields—progress notes, referral letters and attachments. We employ NLP to structure and extract the research-relevant information. Our NLP methodology involves a meticulous examination of free-text records, delving into how clinicians articulate information about specific concepts. The NLP pipeline involves several key stages. For example, let us assume that the aim is to extract the medication-related information from the sentence ‘Patient has stopped taking mirtazapine 50 mg’.

First, a Named-Entity Recognition model, which is a regular expression or a more sophisticated artificial intelligence-based model, extracts mentions of pertinent concepts, in this case medication. Here, ‘mirtazapine’ is extracted as a medication and ‘50 mg’ as a dosage. This is followed by Entity Normalisation, where extractions are mapped onto an existing ontology (here, of medications) or converted into a harmonised format. In this step, all extractions that refer to the same concept are mapped to a single entry. The medication here is therefore mapped to include the drug name ‘mirtazapine’, as well as its RxNorm ID (15996), while the dosage is normalised to include dosage amount (50) and unit (mg). In the next step, Relation Extraction, the model establishes connections between pairs of related extractions, that is, it recognises that ‘50 mg’ refers to the medication ‘mirtazapine’. The final step is Context Classification. Since the mention of a concept does not necessarily imply relevance to the patient’s current condition, the model categorises the context in which the extraction occurs. In our example, the patient would be classified as having taken mirtazapine in the past.

Across all the stages where AI-based models are used—Named-Entity Recognition, Relation Extraction and Context Classification—Bidirectional Encoder Representations from Transformers (BERT)-based models were employed for their high performance and efficiency in processing large datasets. These models were fine-tuned in-house using our own annotated data and are required to achieve an F1 score of at least 80%. At the moment, we do not have comparisons of model parameters for different demographic sub-categories.

Data transformation

Once our variable list is populated with information extracted from electronic health records either from structured sources or with the help of NLP, any number of new variables can be created using SQL. Typically, new variables are created for each new project we work on, depending on the need. For instance, from the patient’s date of birth and age at which a diagnosis is first mentioned in their notes, we can extract a proxy for the age diagnosed. We can derive more involved variables as well. A number of studies focus on patients with treatment-resistant depression (sometimes defined as depression where symptoms do not alleviate after trying two different antidepressants),12 but treatment-resistant depression is not a formal diagnostic category. We can combine data on referrals with NLP-derived medication data to create a flag for those patients who at any point tried three or more antidepressants during a single depressive episode, operationalised as a referral period. We can also define broader diagnostic categories, such as ‘dementias’ (table 3). Referrals to different healthcare teams can be combined with appointment dates to assess the frequency and cost of contact with healthcare services for each individual patient, etc.

Linkages

We have developed a standardised data linkage service to facilitate data sharing. In principle, any dataset with NHS numbers can be linked to the Akrivia cohort.

For an example of an existing linkage, during the deidentification of data, postcodes are mapped to lower layer super output area (or LSOA) codes of residence. LSOA codes can further be linked to publicly available census data. This has allowed us to associate each patient to an index of multiple deprivation,13 which is a single index measure based on what a patient’s neighbourhood is like when it comes to deprivation in income, employment, education, skills and training, health and disability, crime, barriers to housing and services and living environment. The index of multiple deprivation allows for incorporating social factors into research designs, which are critical to control when assessing the outcome of different treatments or for determining the causes of mental illness.14,17 See figure 1C for a map of patients in our database and figure 1B for a distribution of Akrivia patients across deprivation deciles.

Data access

A final step in our data curation pipeline is establishing data access for our NHS partners, as well as industry and academic partners. Once the data tables are populated, they can be accessed via a front-end graphical user interface, the Akrivia CRIS Research Platform. Research and clinical teams can use the platform for locally approved research projects. Over 280 NHS projects have been registered to date. Akrivia Health provides this data curation and access management service to all healthcare organisations in their network for free, along with research support for audits and research projects, and operational support for clinical trial recruitment run through the Akrivia platform. The platform provides functionality for project application and user access management, cohort specification using a graphical interface, aggregate data analysis and visualisation, record-level data exploration (NHS users only), export of deidentified tables within a secure environment for data analysis, patient reidentification for recontact purposes (NHS users only, for projects with Health Research Authority and Research Ethics Committee approval) and audit logging of user activity.

Data representativeness

The Akrivia Cohort contains patients from a variety of healthcare organisations, some of which are large, but it does not contain the entire population of mental health service users in England and Wales. It is therefore not possible to say with certainty how well the cohort represents the patient population, but one can run statistical comparisons within the cohort (eg, comparing men to women, patients with one diagnosis to those with another, etc). However, it is possible to say how well the patients represent the local general population. Namely, each patient has a neighbourhood (LSOA) code assigned to them based on their place of residence, and the UK Office for National Statistics offers LSOA demographics as part of 2021 census data. Each area has features such as total population and age and gender breakdowns recorded, and these can be compared with patient data. Open-access datasets with ethnicity estimates per LSOA exist as well, or they can be requested from the Office for National Statistics. Therefore, LSOA-level information from areas where patients live can be pooled together as the population the patients come from. With this, one can answer questions such as whether people with different demographics have different probabilities of getting various diagnoses and similar.

Findings to date

The Akrivia Health cohort is regularly used for clinical audits as well as research in partnership with the NHS, pharmaceutical companies and academics. In the brief time since it has been established, studies using the Akrivia cohort or its predecessor, the CRIS-UK database, have been published on a range of health-related research topics, including mortality risk when using atypical antipsychotic medication with dementia,18 on treatment for cognitive impairment,19 clinical depression in patients with bowel cancer,20 patients with difficult-to-treat depression,21 psychological interventions for patients with psychosis,22 on the use of cholinesterase inhibitors and memantine in dementia,23 as well as antipsychotic prescribing in older people’s mental health services.24 The dataset has also been used for furthering technical knowledge on applying NLP to electronic health records.25 26

Future work

The main activities around the Akrivia cohort in the near future will continue to be database curation and research support to the NHS, as well as work on commercial projects. We are currently also working on creating a linkage of patients in the Akrivia cohort to their primary care records. At the moment, we have matched 478 568 unique patients across four healthcare organisations, indicating that data linkage for these patients would be possible if needed for a project. We are further looking to expand our work towards in-house peer-reviewed academic publications, as well as offering secondments to academic researchers.

Strengths and limitations

Using data from the Akrivia cohort, it is possible to examine disease patterns, prevalence, incidence and risk factors of mental ill health at the population level, culminating in more precise and personalised risk prediction models. NLP can be used to determine treatment non-compliance, or social issues such as homelessness and unemployment, as well as the presence of adverse life events. Our recent NLP extraction of clinical signs and symptoms offers the opportunity of observing symptomatology at a more granular level, and for comparing diagnostic categorisation with real-world clusters of symptoms. Further, it is possible to associate costs to different treatment pathways, so the efficacy of treatments can be probed at a larger scale, including effectiveness for different demographic groups. The size of our database also allows for deeper insight into rare forms of mental illness. Finally, the longitudinal nature of the data allows for answering epidemiological questions, such as comparing clinical outcomes following different treatments offered to the same patient at different points in time.

There are also important limitations to using Akrivia’s data. First, since electronic health records are not created with research mind, researchers must exercise caution in the interpretation of results, especially with regard to data missingness due to local recording practices or differences in the amount of contact with different patients.

Second, in several key domains (eg, diagnosis, assessment of cognitive function, mood, quality of life, etc), multiple direct and indirect indicators are captured in the Akrivia dataset, requiring a combinatorial approach. For example, information on diagnosis could include harmonised coded diagnosis fields, NLP-derived explicit mentions of diagnosis from free text, referrals to diagnosis-specific services (eg, early intervention in psychosis) and inference of diagnosis based on NLP-derived medication and symptom data. Akrivia Health supports the platform users in developing their analytic methods to account for these complexities in the source data.

Third, at the time of writing, the Akrivia Health dataset captures information from secondary care psychiatric and dementia services only. Since much of the care for certain disorder domains (eg, major depressive disorder) takes place in primary care, the full patient care journey may not be captured within Akrivia data alone. The longitudinal nature of the data offers a buffer against incomplete recording practices as missing data or information not captured can be imputed, but it is imperfect and requires careful thought for every research project. To address this limitation, Akrivia Health is in the process of establishing linkages with a national primary care data aggregator.

Finally, NLP allows for broad extraction of information across millions of progress notes, but it is a probabilistic approach which will sometimes introduce patients with false-positive traits while falsely excluding others from research samples.

Collaboration

There are several routes to accessing the Akrivia cohort. The healthcare organisations in our network have free and full access to all the data that Akrivia curates for their own organisation, as well as several free research tools and free research support from Akrivia researchers. Healthcare organisations can also collaborate on projects together.

Commercial companies can access aggregate data for a fee, in partnership with Akrivia. In a typical project, a pharmaceutical or biotechnology company will ask a research question and Akrivia researchers will analyse the data to answer it. Finally, academic partners can have researcher access to a relevant portion of the anonymised record-level data for a limited time to work on a project, for a smaller fee that covers costs of database maintenance and research support, without a profit margin. In the future, we intend to roll out a model for charities to make use of our data as well.

supplementary material

online supplemental file 1
bmjopen-14-10-s001.pdf (55.3KB, pdf)
DOI: 10.1136/bmjopen-2024-088166

Acknowledgements

The Akrivia Dataset uses data provided by patients and collected by the NHS organisations in our network. The NHS does not bear any responsibility for the information presented in this paper. We believe that using patient data is vital to improve healthcare for everyone, and we would like to extend our thanks to all NHS workers involved, as well as all patients, for their contribution. We would also like to thank Dr Judith Harrison for her helpful comments on an earlier version of this manuscript.

Footnotes

Funding: Akrivia Health is largely funded through agreements with industry partners. Several projects rely on grant funding. Akrivia Health’s services offered to healthcare organisations are free of charge, including data extraction and curation, use of the Akrivia CRIS platform and Secure Data Environment, and direct regulatory and research support.

Prepublication history and additional supplemental material for this paper are available online. To view these files, please visit the journal online (https://doi.org/10.1136/bmjopen-2024-088166).

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient consent for publication: Not applicable.

Data availability free text: Akrivia Health’s patient database is strictly controlled due to its highly sensitive nature, but can be accessed in collaboration with our company (www.akriviahealth.com), following NHS guidelines. For access to the Akrivia Health Database please contact contact@akriviahealth.com for information on fees and data access restrictions.

Patient and public involvement: Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

Ethics approval: Ethics approval for this paper is not needed as it does not involve research on human subjects, and no new data were collected for the creation of this manuscript. Furthermore, no identifiable data were used in the writing of this manuscript, nor in the methodologies and practices used to create the Akrivia CRIS research database.

Contributor Information

Ana Todorovic, Email: ana.todorovic@akriviahealth.com.

Philip Craig, Email: philip.craig@akriviahealth.com.

Simon Pillinger, Email: simon.pillinger@akriviahealth.com.

Panagiota Kontari, Email: panagiota.kontari@akriviahealth.com.

Sophie Gibbons, Email: sophie.gibbons@akriviahealth.com.

Luke Bryden, Email: luke.bryden@akriviahealth.com.

Tarso Franarin, Email: tarso.franarin@akriviahealth.com.

Ceyda Uysal, Email: ceyda.uysal@akriviahealth.com.

Gloria Roque, Email: gloria.roque@akriviahealth.com.

Benjamin Fell, Email: benjamin.fell@akriviahealth.com.

Data availability statement

Data may be obtained from a third party and are not publicly available.

References

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    online supplemental file 1
    bmjopen-14-10-s001.pdf (55.3KB, pdf)
    DOI: 10.1136/bmjopen-2024-088166

    Data Availability Statement

    Data may be obtained from a third party and are not publicly available.


    Articles from BMJ Open are provided here courtesy of BMJ Publishing Group

    RESOURCES