Akrivia Health Database—deep patient characterisation using a secondary mental healthcare dataset in England and Wales: cohort profile

Ana Todorovic; Philip Craig; Simon Pillinger; Panagiota Kontari; Sophie Gibbons; Luke Bryden; Tarso Franarin; Ceyda Uysal; Gloria Roque; Benjamin Fell

doi:10.1136/bmjopen-2024-088166

. 2024 Oct 17;14(10):e088166. doi: 10.1136/bmjopen-2024-088166

Akrivia Health Database—deep patient characterisation using a secondary mental healthcare dataset in England and Wales: cohort profile

Ana Todorovic ^1,^✉, Philip Craig ¹, Simon Pillinger ¹, Panagiota Kontari ¹, Sophie Gibbons ¹, Luke Bryden ¹, Tarso Franarin ¹, Ceyda Uysal ¹, Gloria Roque ¹, Benjamin Fell ¹

PMCID: PMC11488076 PMID: 39419624

Abstract

Purpose

The Akrivia Health cohort was created to extract data from electronic health records in secondary mental health and dementia care services in England and Wales. The data are anonymised, structured and harmonised from the source electronic health records across a range of information technology systems, enabling for unified, privacy-preserving access for research purposes.

Participants

The cohort contains data from electronic health records for over 4.6 million patients in England and Wales, as of January 2024. The data are refreshed with regularity, and the dataset expands whenever a new healthcare provider joins the Akrivia network. 13% of the database are patients under 18 years old (n=590 160), 56% are adults 18–65 years old (n=2 631 690) and 31% are older people (n=1 422 609). About 11.5% are deceased (n=538 371).

Findings to date

Structured data include patient demographics and service pathways. Akrivia Health also uses a bespoke natural language processing model to further extract the research-relevant information from free-text progress notes, including diagnoses, medications and clinical symptoms. This allows for an in-depth longitudinal description of patient journeys.

Future plans

The anonymised data can be accessed in collaboration with Akrivia Health, following the National Health Service guidelines and without requiring a separate ethics application. There is no planned end date for data collection.

Keywords: Electronic Health Records, MENTAL HEALTH, Natural Language Processing, PSYCHIATRY

STRENGTHS AND LIMITATIONS OF THIS STUDY

The cohort includes real-world longitudinal data from mental health and dementia care services in England and Wales, from over 4.6 million patients.
The data are harmonised across different electronic health record systems, and enriched using natural language processing (NLP) to mine information from clinical free-text fields as well as linkages with, for example, census data on neighbourhood deprivation.
The main limitation of NLP is that it is probabilistic, sometimes introducing errors (false positives and false negatives) in the concepts being extracted.
In addition, it is not possible to say how representative the cohort is of the entire population of mental health service users in England and Wales.

Introduction

In the UK, the National Health Service (NHS) provides care for all residents, free at the point of need. As a consequence, nearly all of the UK population is registered with general practitioners (primary care),¹ who refer patients to specialist mental health and dementia services (secondary care) in accordance with National Institute for Health and Care Excellence guidelines. Referrals can also be made by the patients themselves, or their friends and family. This broad national coverage means that the information collected by the NHS could provide unique insights into the real-world mental health of the general population. Importantly, the UK also has a tradition of collecting data using electronic health records, a practice that was instated in 2005. Some services also converted their older paper records to electronic. This allows, in principle, for straightforward access to rich, valuable and longitudinal healthcare data from all patients.

In practice, however, data stored in electronic health records are rarely available in a research-ready format,² and electronic healthcare systems are principally purposed as medicolegal archives, with features most relevant to administrative purposes and not research. In NHS secondary mental healthcare services, relatively little information is recorded in a structured (ie, prespecified) way, such as choosing a diagnosis from a drop-down menu. The vast majority of clinically relevant data are recorded in the form of free text, in progress notes, attachments or fields associated with referrals, written in natural language. Care decisions in psychiatry are influenced by a wide range of biological, psychological and social factors, and as a result free text often includes correspondingly rich, holistic descriptions of patient states, family histories, adverse life events, relational trauma, struggles with relationships or employment and similar. In other words, progress notes are most likely to centre around the information that clinicians deem particularly relevant to their patients.

Gaining systematic and reliable access to these rich and comprehensive longitudinal free-text data is challenging. Akrivia Health is dedicated to collecting and curating these data. The data from secondary mental health services are collated and deidentified, anonymised, harmonised, structured, processed using natural language processing (NLP) and structured query language (SQL), and uploaded in a research-ready format to a secure online platform, Akrivia Clinical Record Interactive Search (CRIS).

Cohort description

The Akrivia Health cohort includes longitudinal data on over 4.6 million patients in secondary mental healthcare services within the NHS in England and Wales. The oldest patient records in the database were registered prior to 1990, for 343 patients in total. Since 1990, the database has had a steady increase in patient records registered each year, with over 100 000 new patients per year since 2005 (figure 1A). However, the Akrivia Health cohort is not the first database of its kind in England. Originally, a cohort was developed in 2008 as The South London and Maudsley NHS Foundation Trust Biomedical Research Centre Case Register and its CRIS application, and it continues to exist as a separate database, extensively used in research.^{3 4} A related project was then initiated at the University of Oxford, resting on similar data extraction principles, with the difference that federated access to different NHS healthcare organisations (or trusts) was instated (these were known as UK-CRIS and Dementia (D)-CRIS). That project, initially funded by the National Institute for Health and Care Research (NIHR) and the Medical Research Council (MRC), has spun out in 2019 as a commercial data curation and research company, Akrivia Health. The Akrivia cohort has since grown to include 14 NHS healthcare organisations, with a further six currently onboarding and soon to be added to the database. Once added, this will equal 38% of the total 53 NHS organisations in England and Wales that offer secondary care for mental health and dementias. The database is regularly updated against the source files, with no planned stopping point to data collection and without gaps in data acquisition. The data are refreshed fortnightly wherever possible, with a median refresh frequency of 6 weeks and a maximum lag of 11 months.

In January 2024, the Akrivia Health cohort had 4 648 248 patients in total, of which 4 109 877 were not recorded as being deceased. Overall, the database covers demographic information, diagnoses, medication, referrals to care teams, hospitalisations, clinical signs and symptoms, elements of treatment pathways, life circumstances, health scores on questionnaires, etc. Demographic patient details can be seen in table 1, and other variables in table 2. Please see the online supplemental file 1 for code examples to create the tables.

Table 1. Demographic characteristics of the Akrivia cohort (January 2024 dataset build).

Characteristics	Counts or examples
Age in January 2024
Age under 18	590 160 (13%)
Age 19–65	2 631 690 (56%)
Age over 65	1 422 609 (31%)
Gender
Male	2 121 704 (45, 7%)
Female	2 522 088 (54, 2%)
Other/non-binary	3465 (0, 07%)
Not known	1057 (0, 03%)
Ethnicity
Asian	141 945 (3%)
Black	79 818 (1, 7%)
White	2 736 548 (60%)
Mixed	79 249 (1, 7%)
Other	74 463 (1, 6%)
Not known	1 536 291 (33%)
Marital status
Single	903 416 (19%)
Cohabitating	21 424 (0, 5%)
Married/civil partnership	527 901 (11, 5%)
Separated	47 151 (1%)
Divorced	80 565 (2%)
Widowed	196 678 (4%)
Not known	2 871 179 (62%)
Mortality status
Living	4 109 877 (88, 5%)
Deceased	538 371 (11, 5%)

Open in a new tab

Table 2. Akrivia cohort data dictionary—list of readily available variables.

Characteristics	Counts or examples
Date of registration 1990–2000 (end of 1999) 2000–2010 (end of 2009) 2010–2020 (end of 2019) 2020+	112 600 (2,5%) 1 001 563 (21.5%) 2 555 946 (56%)945 351 (20%)
Diagnosis (structured) International Classification of Diseases (ICD)-10 code ICD-11 code Diagnosis description Diagnosis recorded date Diagnosis end date Duration in days	760 272 patients (or 16,4%) have one.Top three structured diagnoses given to women:dementia in Alzheimer’s disease, atypical or mixed (F02), mild cognitive disorder (F067), emotionally unstable personality disorder (F603)Top three structured diagnoses given to men:paranoid schizophrenia (F200), mental and behavioural disorders due to the use of tobacco (F171), personal history of self-harm (Z915)
Diagnosis (NLP) Progress note date ICD-10 code Diagnosis description Experiencer (patient or other) Status (has, had, could have, does not have)	2 031 939 patients (43,7%) have one.Top three NLP diagnoses given to women: unspecified dementia (F03), autistic disorder (F840), unspecified psychosis (F29)Top three NLP diagnoses given to men: unspecified dementia (F03), autistic disorder (F840), attention-deficit hyperactivity disorder (F90)
Medication (NLP) Progress note date Drug name Status (is on, was on, other) Dosage amount and unit	1 810 887 patients (39%)Top 10 (ranked by how many patients were prescribed this medication):sertraline, mirtazapine, paracetamol, citalopram, diazepam, zopiclone, lorazepam, quetiapine, fluoxetine, olanzapine
Inpatient stays Admission date Discharge date Admission duration Admission source Admission method Discharge destination Discharge method	254 410 patients (5.5%)Example:22 August 200125 August 2001Three daysNHS general hospital wardEmergencyUsual place of residenceDischarged on clinical advice
Referrals Referred to Referral date Referral received Referral accepted Discharge date	3 537 615 patients (76%) have referrals dataExamples: memory clinic, forensic liaison service, low security inpatient service, personality disorder team, continuing care
Health scores (questionnaires) Addenbrooke’s Cognitive Examination (ACE) Beck’s Depression Inventory (BDI) Clinical Outcomes in Routine Evaluation (CORE-10) Generalised Anxiety Disorder (GAD-7) Hamilton Depression Rating Scale (HAMD) Health of the Nation Outcome Scales (HoNOS) Montgomery-Asberg Depression Rating Scale (MADRS) Mini Mental State Examination (MMSE) Montreal Cognitive Assessment (MOCA) Positive and Negative Syndrome Scale (PANSS) Patient Health Questionnaire (PHQ-9) Quick Inventory of Depressive Symptomatology (QIDS) Quality of Life Scale (QOLS) Work and Social Adjustment Scale (WSAS)	NLP-derived, total score only289 092 patients (6%) have at least one ACE: 65 464 (1, 5%) BDI: 3566 (0, 07%) CORE-10: 8271 (0, 17%) GAD-7: 65 193 (1, 4%) HAMD: 395 (<0, 01%) HoNOS: 6413 (1, 38%) MADRS: 930 (0, 02%) MMSE: 121 484 (2, 61%) MOCA: 56 236 (1, 2%) PANSS: 260 (<0, 01%) PHQ-9: 31 707 (0, 7%) QIDS: 202 (<0, 01%) QOLS: 10 (<0, 01%) WSAS: 6636 (0, 14%) Structured scores with individual questions: GAD-7: 5733 (0, 1%) PHQ-9: 15 590 (0, 3%) HoNOS: 386 309 (7, 6%)
Index of multiple deprivationNeighbourhood deprivation in income, employment, education, skills and training, health and disability, crime, barriers to housing and services	4 018 223 patients (86, 5%)
Electroconvulsive (ECT) therapy (NLP)Mention of ECT in progress notes	49 765 patients (1%)
Psychotherapy (NLP)Mention of psychotherapy in progress notes	1 229 613 patients (26%)
Anhedonia (NLP)Mention of anhedonia in progress notes	331 802 patients (7%)
Employment issues (NLP)Mention of employment insecurity in progress notes	251 308 patients (5, 5%)
Signs and symptoms (NLP)Contextually classified traits and behaviours associated with mental health problems Status (has, had, does not have, other)	2 629 776 patients (57%)10 most common signs and symptoms:anxious, low mood, mention of mood, worried, happy, settled, low in mood, mention of appetite, upset, angry
Substance use (NLP)Contextually classified substances Status (is using, was using, is not using, other)	1 344 252 (29%) patientsSubstance types in order of frequency:alcohol, nicotine, other, cannabis, cocaine, opiates
Accommodation issues (NLP)Mention of homelessness in progress notes	33 220 patients (0, 7%)
UK Biobank flagPatient exists in UK Biobank	18 033 patients (0, 4%)

Open in a new tab

NHSNational Health ServiceNLPnatural language processing

About 53% of the patients do not have a mental health or dementia diagnosis recorded in the structured portion of their electronic health records or written in their progress notes, although it is possible that diagnosis information exists elsewhere in their electronic health records, for example, in attachments or referrals. Among the 47% whose diagnoses we have located (over 2 million patients), the average patient has 4.38 different diagnostic labels with an SD of 5.52. These numbers include 29% patients who have only one diagnosis with no comorbidities. An overview of common diagnosis types and patient numbers can be seen in table 3.

Table 3. Common diagnostic categories in the Akrivia cohort (January 2024 dataset build).

Diagnostic category	ICD-10 codes	Total number of patients
Major depressive disorder	F32, F33, F53.0	296 229
Psychotic disorders	F2, F10.5, F11.5, F12.5, F13.5, F14.5, F15.5, F16.5, F17.5, F18.5, F19.5, F06.0, F06.1, F06.2	246 298
Schizophrenia	F20	110 005
Anxiety disorders	F40, F41, F42, F43.1, F45.2	285 494
Bipolar disorder	F30, F31, F25	154 533
Eating disorders	F50	61 874
Personality disorders	F60, F61, F62, F69	152 361
Developmental disorders	F80, F81, F82, F83, F84, F88, F89, F90	348 409
Dementias	F00, F01, F02, F03, G30, G31.0, G31.83	428 841
Alzheimer’s disease	F00, G30	212 061
Mild cognitive disorder	F06.7, G31.84	72 798
Treatment-resistant depression	F32, F33, F53.0 + tried more than two antidepressants during a single depressive episode (from referral to discharge)	187 451

Open in a new tab

Information governance

Akrivia Health’s governance model is built to incorporate data protection by design, ensuring that data can be used for research in a commercially sustainable model while protecting the privacy of individuals.⁵ Research-relevant elements of the electronic health records are deidentified (ie, pseudonymised), with Akrivia Health working with healthcare organisations as their data processor (UK GDPR, article 4 (8)). Deidentification (article 4 (5)) significantly reduces the identifiability of individuals but does not, in a European Union and UK legal context, render the data anonymous (recital 26). We further anonymise the pseudonymised data in a non-reversable format, thereby removing it from the material scope of the UK General Data Protection Regulation (GDPR), which only applies to personal data (article 2 (1)). The anonymised version of the dataset is controlled by Akrivia Health and may be used for commercial activity in line with the Information Standards Board for Health and Social Care (ISB)1523.⁶ Partner healthcare organisations determine precisely which data is extracted from their electronic health records, and organisations can and do remove patients who have opted out of their data being used beyond direct care.⁷

Data harmonisation

The data harmonisation process includes pooling together and standardising data from diverse sources. Different healthcare organisations within the NHS record information using different electronic systems, such as Rio, CareNotes, Paris, CareDirector or SystmOne. This means that the same information might be coded differently by different organisations.

As an example, some healthcare organisations use national codes for ethnicity categories defined in the NHS Data Dictionary⁸ based on the results of the 2001 census. However, healthcare organisations do not consistently use the national codes^{9 10} and each one may have their own internal codes for ethnicity that are mapped to the national codes, or they might use an alternative version of the national codes.¹¹ Akrivia’s approach to ethnicity harmonisation is to align with the national categories, that is, mapping all ethnicity descriptions to five broad categories across all healthcare organisations. While this means that fine-grained information recorded in some organisations is lost, harmonisation allows for combining data in greater numbers as the same query can be run across all trusts in our network simultaneously.

Natural language processing

A key challenge for researching NHS psychiatric electronic health records is that most of the clinically relevant information is present in free-text fields—progress notes, referral letters and attachments. We employ NLP to structure and extract the research-relevant information. Our NLP methodology involves a meticulous examination of free-text records, delving into how clinicians articulate information about specific concepts. The NLP pipeline involves several key stages. For example, let us assume that the aim is to extract the medication-related information from the sentence ‘Patient has stopped taking mirtazapine 50 mg’.

First, a Named-Entity Recognition model, which is a regular expression or a more sophisticated artificial intelligence-based model, extracts mentions of pertinent concepts, in this case medication. Here, ‘mirtazapine’ is extracted as a medication and ‘50 mg’ as a dosage. This is followed by Entity Normalisation, where extractions are mapped onto an existing ontology (here, of medications) or converted into a harmonised format. In this step, all extractions that refer to the same concept are mapped to a single entry. The medication here is therefore mapped to include the drug name ‘mirtazapine’, as well as its RxNorm ID (15996), while the dosage is normalised to include dosage amount (50) and unit (mg). In the next step, Relation Extraction, the model establishes connections between pairs of related extractions, that is, it recognises that ‘50 mg’ refers to the medication ‘mirtazapine’. The final step is Context Classification. Since the mention of a concept does not necessarily imply relevance to the patient’s current condition, the model categorises the context in which the extraction occurs. In our example, the patient would be classified as having taken mirtazapine in the past.

Across all the stages where AI-based models are used—Named-Entity Recognition, Relation Extraction and Context Classification—Bidirectional Encoder Representations from Transformers (BERT)-based models were employed for their high performance and efficiency in processing large datasets. These models were fine-tuned in-house using our own annotated data and are required to achieve an F1 score of at least 80%. At the moment, we do not have comparisons of model parameters for different demographic sub-categories.

Data transformation

Once our variable list is populated with information extracted from electronic health records either from structured sources or with the help of NLP, any number of new variables can be created using SQL. Typically, new variables are created for each new project we work on, depending on the need. For instance, from the patient’s date of birth and age at which a diagnosis is first mentioned in their notes, we can extract a proxy for the age diagnosed. We can derive more involved variables as well. A number of studies focus on patients with treatment-resistant depression (sometimes defined as depression where symptoms do not alleviate after trying two different antidepressants),¹² but treatment-resistant depression is not a formal diagnostic category. We can combine data on referrals with NLP-derived medication data to create a flag for those patients who at any point tried three or more antidepressants during a single depressive episode, operationalised as a referral period. We can also define broader diagnostic categories, such as ‘dementias’ (table 3). Referrals to different healthcare teams can be combined with appointment dates to assess the frequency and cost of contact with healthcare services for each individual patient, etc.

Linkages

We have developed a standardised data linkage service to facilitate data sharing. In principle, any dataset with NHS numbers can be linked to the Akrivia cohort.

For an example of an existing linkage, during the deidentification of data, postcodes are mapped to lower layer super output area (or LSOA) codes of residence. LSOA codes can further be linked to publicly available census data. This has allowed us to associate each patient to an index of multiple deprivation,¹³ which is a single index measure based on what a patient’s neighbourhood is like when it comes to deprivation in income, employment, education, skills and training, health and disability, crime, barriers to housing and services and living environment. The index of multiple deprivation allows for incorporating social factors into research designs, which are critical to control when assessing the outcome of different treatments or for determining the causes of mental illness.14,17 See figure 1C for a map of patients in our database and figure 1B for a distribution of Akrivia patients across deprivation deciles.

Data access

A final step in our data curation pipeline is establishing data access for our NHS partners, as well as industry and academic partners. Once the data tables are populated, they can be accessed via a front-end graphical user interface, the Akrivia CRIS Research Platform. Research and clinical teams can use the platform for locally approved research projects. Over 280 NHS projects have been registered to date. Akrivia Health provides this data curation and access management service to all healthcare organisations in their network for free, along with research support for audits and research projects, and operational support for clinical trial recruitment run through the Akrivia platform. The platform provides functionality for project application and user access management, cohort specification using a graphical interface, aggregate data analysis and visualisation, record-level data exploration (NHS users only), export of deidentified tables within a secure environment for data analysis, patient reidentification for recontact purposes (NHS users only, for projects with Health Research Authority and Research Ethics Committee approval) and audit logging of user activity.

Data representativeness

The Akrivia Cohort contains patients from a variety of healthcare organisations, some of which are large, but it does not contain the entire population of mental health service users in England and Wales. It is therefore not possible to say with certainty how well the cohort represents the patient population, but one can run statistical comparisons within the cohort (eg, comparing men to women, patients with one diagnosis to those with another, etc). However, it is possible to say how well the patients represent the local general population. Namely, each patient has a neighbourhood (LSOA) code assigned to them based on their place of residence, and the UK Office for National Statistics offers LSOA demographics as part of 2021 census data. Each area has features such as total population and age and gender breakdowns recorded, and these can be compared with patient data. Open-access datasets with ethnicity estimates per LSOA exist as well, or they can be requested from the Office for National Statistics. Therefore, LSOA-level information from areas where patients live can be pooled together as the population the patients come from. With this, one can answer questions such as whether people with different demographics have different probabilities of getting various diagnoses and similar.

Findings to date

The Akrivia Health cohort is regularly used for clinical audits as well as research in partnership with the NHS, pharmaceutical companies and academics. In the brief time since it has been established, studies using the Akrivia cohort or its predecessor, the CRIS-UK database, have been published on a range of health-related research topics, including mortality risk when using atypical antipsychotic medication with dementia,¹⁸ on treatment for cognitive impairment,¹⁹ clinical depression in patients with bowel cancer,²⁰ patients with difficult-to-treat depression,²¹ psychological interventions for patients with psychosis,²² on the use of cholinesterase inhibitors and memantine in dementia,²³ as well as antipsychotic prescribing in older people’s mental health services.²⁴ The dataset has also been used for furthering technical knowledge on applying NLP to electronic health records.^{25 26}

Future work

The main activities around the Akrivia cohort in the near future will continue to be database curation and research support to the NHS, as well as work on commercial projects. We are currently also working on creating a linkage of patients in the Akrivia cohort to their primary care records. At the moment, we have matched 478 568 unique patients across four healthcare organisations, indicating that data linkage for these patients would be possible if needed for a project. We are further looking to expand our work towards in-house peer-reviewed academic publications, as well as offering secondments to academic researchers.

Strengths and limitations

Using data from the Akrivia cohort, it is possible to examine disease patterns, prevalence, incidence and risk factors of mental ill health at the population level, culminating in more precise and personalised risk prediction models. NLP can be used to determine treatment non-compliance, or social issues such as homelessness and unemployment, as well as the presence of adverse life events. Our recent NLP extraction of clinical signs and symptoms offers the opportunity of observing symptomatology at a more granular level, and for comparing diagnostic categorisation with real-world clusters of symptoms. Further, it is possible to associate costs to different treatment pathways, so the efficacy of treatments can be probed at a larger scale, including effectiveness for different demographic groups. The size of our database also allows for deeper insight into rare forms of mental illness. Finally, the longitudinal nature of the data allows for answering epidemiological questions, such as comparing clinical outcomes following different treatments offered to the same patient at different points in time.

There are also important limitations to using Akrivia’s data. First, since electronic health records are not created with research mind, researchers must exercise caution in the interpretation of results, especially with regard to data missingness due to local recording practices or differences in the amount of contact with different patients.

Second, in several key domains (eg, diagnosis, assessment of cognitive function, mood, quality of life, etc), multiple direct and indirect indicators are captured in the Akrivia dataset, requiring a combinatorial approach. For example, information on diagnosis could include harmonised coded diagnosis fields, NLP-derived explicit mentions of diagnosis from free text, referrals to diagnosis-specific services (eg, early intervention in psychosis) and inference of diagnosis based on NLP-derived medication and symptom data. Akrivia Health supports the platform users in developing their analytic methods to account for these complexities in the source data.

Third, at the time of writing, the Akrivia Health dataset captures information from secondary care psychiatric and dementia services only. Since much of the care for certain disorder domains (eg, major depressive disorder) takes place in primary care, the full patient care journey may not be captured within Akrivia data alone. The longitudinal nature of the data offers a buffer against incomplete recording practices as missing data or information not captured can be imputed, but it is imperfect and requires careful thought for every research project. To address this limitation, Akrivia Health is in the process of establishing linkages with a national primary care data aggregator.

Finally, NLP allows for broad extraction of information across millions of progress notes, but it is a probabilistic approach which will sometimes introduce patients with false-positive traits while falsely excluding others from research samples.

Collaboration

There are several routes to accessing the Akrivia cohort. The healthcare organisations in our network have free and full access to all the data that Akrivia curates for their own organisation, as well as several free research tools and free research support from Akrivia researchers. Healthcare organisations can also collaborate on projects together.

Commercial companies can access aggregate data for a fee, in partnership with Akrivia. In a typical project, a pharmaceutical or biotechnology company will ask a research question and Akrivia researchers will analyse the data to answer it. Finally, academic partners can have researcher access to a relevant portion of the anonymised record-level data for a limited time to work on a project, for a smaller fee that covers costs of database maintenance and research support, without a profit margin. In the future, we intend to roll out a model for charities to make use of our data as well.

supplementary material

online supplemental file 1

bmjopen-14-10-s001.pdf^{(55.3KB, pdf)}

DOI: 10.1136/bmjopen-2024-088166

Acknowledgements

The Akrivia Dataset uses data provided by patients and collected by the NHS organisations in our network. The NHS does not bear any responsibility for the information presented in this paper. We believe that using patient data is vital to improve healthcare for everyone, and we would like to extend our thanks to all NHS workers involved, as well as all patients, for their contribution. We would also like to thank Dr Judith Harrison for her helpful comments on an earlier version of this manuscript.

Footnotes

Funding: Akrivia Health is largely funded through agreements with industry partners. Several projects rely on grant funding. Akrivia Health’s services offered to healthcare organisations are free of charge, including data extraction and curation, use of the Akrivia CRIS platform and Secure Data Environment, and direct regulatory and research support.

Prepublication history and additional supplemental material for this paper are available online. To view these files, please visit the journal online (https://doi.org/10.1136/bmjopen-2024-088166).

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient consent for publication: Not applicable.

Data availability free text: Akrivia Health’s patient database is strictly controlled due to its highly sensitive nature, but can be accessed in collaboration with our company (www.akriviahealth.com), following NHS guidelines. For access to the Akrivia Health Database please contact contact@akriviahealth.com for information on fees and data access restrictions.

Patient and public involvement: Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

Ethics approval: Ethics approval for this paper is not needed as it does not involve research on human subjects, and no new data were collected for the creation of this manuscript. Furthermore, no identifiable data were used in the writing of this manuscript, nor in the methodologies and practices used to create the Akrivia CRIS research database.

Contributor Information

Ana Todorovic, Email: ana.todorovic@akriviahealth.com.

Philip Craig, Email: philip.craig@akriviahealth.com.

Simon Pillinger, Email: simon.pillinger@akriviahealth.com.

Panagiota Kontari, Email: panagiota.kontari@akriviahealth.com.

Sophie Gibbons, Email: sophie.gibbons@akriviahealth.com.

Luke Bryden, Email: luke.bryden@akriviahealth.com.

Tarso Franarin, Email: tarso.franarin@akriviahealth.com.

Ceyda Uysal, Email: ceyda.uysal@akriviahealth.com.

Gloria Roque, Email: gloria.roque@akriviahealth.com.

Benjamin Fell, Email: benjamin.fell@akriviahealth.com.

Data availability statement

Data may be obtained from a third party and are not publicly available.

References

1.Baker C. Population estimates & gp registers: why the difference? 2016. [24-Jan-2024]. https://commonslibrary.parliament.uk/population-estimates-gp-registers-why-the-difference/ Available. Accessed.
2.Ford E, Boyd A, Bowles JKF, et al. Our data, our society, our health: A vision for inclusive and transparent health data science in the United Kingdom and beyond. Learn Health Syst. 2019;3:e10191. doi: 10.1002/lrh2.10191. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Stewart R, Soremekun M, Perera G, et al. The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register: development and descriptive data. BMC Psychiatry. 2009;9:51. doi: 10.1186/1471-244X-9-51. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Perera G, Broadbent M, Callard F, et al. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource. BMJ Open. 2016;6:e008721. doi: 10.1136/bmjopen-2015-008721. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Regulation - 2016/679 - EN - gdpr - EUR-Lex. [24-Jan-2024]. https://eur-lex.europa.eu/eli/reg/2016/679/oj Available. Accessed.
6.NHS Digit; [25-Jan-2024]. ISB1523: anonymisation standard for publishing health and social care data.https://digital.nhs.uk/data-and-information/information-standards/information-standards-and-data-collections-including-extractions/publications-and-notifications/standards-and-collections/isb1523-anonymisation-standard-for-publishing-health-and-social-care-data Available. Accessed. [Google Scholar]
7.NHS Digit; [25-Jan-2024]. National data opt-out operational policy guidance document.https://digital.nhs.uk/services/national-data-opt-out/operational-policy-guidance-document Available. Accessed. [Google Scholar]
8.NHS data dictionary, ethnic category. 2001. https://www.datadictionary.nhs.uk/data_elements/ethnic_category.html Available.
9.Annan-Callcot G. Wellcome; 2023. The problem with ethnicity categories in uk health data | news.https://wellcome.org/news/ethnicity-categories-uk-health-data Available. [Google Scholar]
10.Scobie S, Spencer J, Raleigh V. Nuffield Trust; Ethnicity coding in english health service datasets. [Google Scholar]
11.Raleigh DVS, Goldblatt PP. Ethnicity coding in health records. https://www.kingsfund.org.uk/sites/default/files/2021-01/NHSE-letter-ethnicity-coding-health-records-oct2020.pdf n.d. Available.
12.Sforzini L, Worrell C, Kose M, et al. A Delphi-method-based consensus guideline for definition of treatment-resistant depression for clinical trials. Mol Psychiatry. 2022;27:1286–99. doi: 10.1038/s41380-021-01381-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.GOVUK; 2019. [24-Jan-2024]. English indices of deprivation.https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019 Available. Accessed. [Google Scholar]
14.Parsons A. mySociety; 2021. Unified uk measures of rurality and deprivation.https://www.mysociety.org/2021/04/22/unified-uk-measures-of-rurality-and-deprivation/ Available. [Google Scholar]
15.Qi X, Jia Y, Pan C. Index of multiple deprivation contributed to common psychiatric disorders: A systematic review and comprehensive analysis. Neurosci Biobehav Rev. 2022;140:104806. doi: 10.1016/j.neubiorev.2022.104806. [DOI] [PubMed] [Google Scholar]
16.Hashmi R, Alam K, Gow J, et al. Prevalence of Mental Disorders by Socioeconomic Status in Australia: A Cross-Sectional Epidemiological Study. Am J Health Promot. 2021;35:533–42. doi: 10.1177/0890117120968656. [DOI] [PubMed] [Google Scholar]
17.Fahey N, Soni A, Allison J, et al. Education Mitigates the Relationship of Stress and Mental Disorders Among Rural Indian Women. Ann Glob Health. 2016;82:779–87. doi: 10.1016/j.aogh.2016.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Phiri P, Engelthaler T, Carr H, et al. Associated mortality risk of atypical antipsychotic medication in individuals with dementia. World J Psychiatry. 2022;12:298–307. doi: 10.5498/wjp.v12.i2.298. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Liu Q, Vaci N, Koychev I, et al. Personalised treatment for cognitive impairment in dementia: development and validation of an artificial intelligence model. BMC Med. 2022;20:45. doi: 10.1186/s12916-022-02250-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Delanerolle G, Cavalini HS. 395P Prevalence of clinically diagnosed depression among patients undergoing treatment for bowel cancer. Ann Oncol. 2022;33:S717–8. doi: 10.1016/j.annonc.2022.07.533. [DOI] [Google Scholar]
21.Costa T, Menzat B, Engelthaler T, et al. The burden associated with, and management of, difficult-to-treat depression in patients under specialist psychiatric care in the United Kingdom. J Psychopharmacol. 2022;36:545–56. doi: 10.1177/02698811221090628. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Jacobsen PC, Teale A, Burgess-Barr S. A study investigating the implementation of nice recommended psychological interventions for people with psychosis following a psychiatric inpatient admission. PsyArXiv. 2024 doi: 10.31234/osf.io/yemqb. Preprint. [DOI] [PubMed]
23.Vaci N, Koychev I, Kim C-H, et al. Real-world effectiveness, its predictors and onset of action of cholinesterase inhibitors and memantine in dementia: retrospective health record study. Br J Psychiatry. 2021;218:261–7. doi: 10.1192/bjp.2020.136. [DOI] [PubMed] [Google Scholar]
24.Phiri P, Carr H, Rathod S. The Frequency of Antipsychotic Prescribing in Older People Mental Health Services: A Southern Health OPMH CRIS Audit. Acta Psychopathol. 05:0. doi: 10.4172/2469-6676.100182. n.d. [DOI] [Google Scholar]
25.Kormilitzin A, Vaci N, Liu Q, et al. An efficient representation of chronological events in medical texts. 2020. [DOI]
26.Gligic L, Kormilitzin A, Goldberg P, et al. Named entity recognition in electronic health records using transfer learning bootstrapped Neural Networks. Neural Netw. 2020;121:132–9. doi: 10.1016/j.neunet.2019.08.032. [DOI] [PubMed] [Google Scholar]

BMJ Open. 2024 Oct 17.

Review Process File

bmjopen-2024-088166.reviewer_comments.pdf^{(349.6KB, pdf)}

Open in a new tab

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

online supplemental file 1

bmjopen-14-10-s001.pdf^{(55.3KB, pdf)}

DOI: 10.1136/bmjopen-2024-088166

Data Availability Statement

Data may be obtained from a third party and are not publicly available.

[R1] 1.Baker C. Population estimates & gp registers: why the difference? 2016. [24-Jan-2024]. https://commonslibrary.parliament.uk/population-estimates-gp-registers-why-the-difference/ Available. Accessed.

[R2] 2.Ford E, Boyd A, Bowles JKF, et al. Our data, our society, our health: A vision for inclusive and transparent health data science in the United Kingdom and beyond. Learn Health Syst. 2019;3:e10191. doi: 10.1002/lrh2.10191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Stewart R, Soremekun M, Perera G, et al. The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register: development and descriptive data. BMC Psychiatry. 2009;9:51. doi: 10.1186/1471-244X-9-51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Perera G, Broadbent M, Callard F, et al. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource. BMJ Open. 2016;6:e008721. doi: 10.1136/bmjopen-2015-008721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Regulation - 2016/679 - EN - gdpr - EUR-Lex. [24-Jan-2024]. https://eur-lex.europa.eu/eli/reg/2016/679/oj Available. Accessed.

[R6] 6.NHS Digit; [25-Jan-2024]. ISB1523: anonymisation standard for publishing health and social care data.https://digital.nhs.uk/data-and-information/information-standards/information-standards-and-data-collections-including-extractions/publications-and-notifications/standards-and-collections/isb1523-anonymisation-standard-for-publishing-health-and-social-care-data Available. Accessed. [Google Scholar]

[R7] 7.NHS Digit; [25-Jan-2024]. National data opt-out operational policy guidance document.https://digital.nhs.uk/services/national-data-opt-out/operational-policy-guidance-document Available. Accessed. [Google Scholar]

[R8] 8.NHS data dictionary, ethnic category. 2001. https://www.datadictionary.nhs.uk/data_elements/ethnic_category.html Available.

[R9] 9.Annan-Callcot G. Wellcome; 2023. The problem with ethnicity categories in uk health data | news.https://wellcome.org/news/ethnicity-categories-uk-health-data Available. [Google Scholar]

[R10] 10.Scobie S, Spencer J, Raleigh V. Nuffield Trust; Ethnicity coding in english health service datasets. [Google Scholar]

[R11] 11.Raleigh DVS, Goldblatt PP. Ethnicity coding in health records. https://www.kingsfund.org.uk/sites/default/files/2021-01/NHSE-letter-ethnicity-coding-health-records-oct2020.pdf n.d. Available.

[R12] 12.Sforzini L, Worrell C, Kose M, et al. A Delphi-method-based consensus guideline for definition of treatment-resistant depression for clinical trials. Mol Psychiatry. 2022;27:1286–99. doi: 10.1038/s41380-021-01381-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.GOVUK; 2019. [24-Jan-2024]. English indices of deprivation.https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019 Available. Accessed. [Google Scholar]

[R14] 14.Parsons A. mySociety; 2021. Unified uk measures of rurality and deprivation.https://www.mysociety.org/2021/04/22/unified-uk-measures-of-rurality-and-deprivation/ Available. [Google Scholar]

[R15] 15.Qi X, Jia Y, Pan C. Index of multiple deprivation contributed to common psychiatric disorders: A systematic review and comprehensive analysis. Neurosci Biobehav Rev. 2022;140:104806. doi: 10.1016/j.neubiorev.2022.104806. [DOI] [PubMed] [Google Scholar]

[R16] 16.Hashmi R, Alam K, Gow J, et al. Prevalence of Mental Disorders by Socioeconomic Status in Australia: A Cross-Sectional Epidemiological Study. Am J Health Promot. 2021;35:533–42. doi: 10.1177/0890117120968656. [DOI] [PubMed] [Google Scholar]

[R17] 17.Fahey N, Soni A, Allison J, et al. Education Mitigates the Relationship of Stress and Mental Disorders Among Rural Indian Women. Ann Glob Health. 2016;82:779–87. doi: 10.1016/j.aogh.2016.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Phiri P, Engelthaler T, Carr H, et al. Associated mortality risk of atypical antipsychotic medication in individuals with dementia. World J Psychiatry. 2022;12:298–307. doi: 10.5498/wjp.v12.i2.298. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Liu Q, Vaci N, Koychev I, et al. Personalised treatment for cognitive impairment in dementia: development and validation of an artificial intelligence model. BMC Med. 2022;20:45. doi: 10.1186/s12916-022-02250-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Delanerolle G, Cavalini HS. 395P Prevalence of clinically diagnosed depression among patients undergoing treatment for bowel cancer. Ann Oncol. 2022;33:S717–8. doi: 10.1016/j.annonc.2022.07.533. [DOI] [Google Scholar]

[R21] 21.Costa T, Menzat B, Engelthaler T, et al. The burden associated with, and management of, difficult-to-treat depression in patients under specialist psychiatric care in the United Kingdom. J Psychopharmacol. 2022;36:545–56. doi: 10.1177/02698811221090628. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Jacobsen PC, Teale A, Burgess-Barr S. A study investigating the implementation of nice recommended psychological interventions for people with psychosis following a psychiatric inpatient admission. PsyArXiv. 2024 doi: 10.31234/osf.io/yemqb. Preprint. [DOI] [PubMed]

[R23] 23.Vaci N, Koychev I, Kim C-H, et al. Real-world effectiveness, its predictors and onset of action of cholinesterase inhibitors and memantine in dementia: retrospective health record study. Br J Psychiatry. 2021;218:261–7. doi: 10.1192/bjp.2020.136. [DOI] [PubMed] [Google Scholar]

[R24] 24.Phiri P, Carr H, Rathod S. The Frequency of Antipsychotic Prescribing in Older People Mental Health Services: A Southern Health OPMH CRIS Audit. Acta Psychopathol. 05:0. doi: 10.4172/2469-6676.100182. n.d. [DOI] [Google Scholar]

[R25] 25.Kormilitzin A, Vaci N, Liu Q, et al. An efficient representation of chronological events in medical texts. 2020. [DOI]

[R26] 26.Gligic L, Kormilitzin A, Goldberg P, et al. Named entity recognition in electronic health records using transfer learning bootstrapped Neural Networks. Neural Netw. 2020;121:132–9. doi: 10.1016/j.neunet.2019.08.032. [DOI] [PubMed] [Google Scholar]

PERMALINK

Akrivia Health Database—deep patient characterisation using a secondary mental healthcare dataset in England and Wales: cohort profile

Ana Todorovic

Philip Craig

Simon Pillinger

Panagiota Kontari

Sophie Gibbons

Luke Bryden

Tarso Franarin

Ceyda Uysal

Gloria Roque

Benjamin Fell

Abstract

Abstract

Purpose

Participants

Findings to date

Future plans

STRENGTHS AND LIMITATIONS OF THIS STUDY

Introduction

Cohort description

Table 1. Demographic characteristics of the Akrivia cohort (January 2024 dataset build).

Table 2. Akrivia cohort data dictionary—list of readily available variables.

Table 3. Common diagnostic categories in the Akrivia cohort (January 2024 dataset build).

Information governance

Data harmonisation

Natural language processing

Data transformation

Linkages

Data access

Data representativeness

Findings to date

Future work

Strengths and limitations

Collaboration

supplementary material

Acknowledgements

Footnotes

Contributor Information

Data availability statement

References

Review Process File

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases