Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2023 Nov 29;18(11):e0294666. doi: 10.1371/journal.pone.0294666

Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

Adeniyi Francis Fagbamigbe 1,2,3,4,*, Utkarsh Agrawal 5, Amaya Azcoaga-Lorenzo 1,6, Briana MacKerron 1, Eda Bilici Özyiğit 7, Daniel C Alexander 7, Ashley Akbari 8, Rhiannon K Owen 8, Jane Lyons 8, Ronan A Lyons 8, Spiros Denaxas 9,10, Paul Kirk 11, Ana Corina Miller 12, Gill Harper 13, Carol Dezateux 13, Anthony Brookes 14, Sylvia Richardson 11, Krishnarajah Nirantharakumar 15, Bruce Guthrie 16, Lloyd Hughes 1, Umesh T Kadam 17, Kamlesh Khunti 18, Keith R Abrams 19, Colin McCowan 1
Editor: Sreeram V Ramagopalan20
PMCID: PMC10686427  PMID: 38019832

Abstract

There is still limited understanding of how chronic conditions co-occur in patients with multimorbidity and what are the consequences for patients and the health care system. Most reported clusters of conditions have not considered the demographic characteristics of these patients during the clustering process. The study used data for all registered patients that were resident in Fife or Tayside, Scotland and aged 25 years or more on 1st January 2000 and who were followed up until 31st December 2018. We used linked demographic information, and secondary care electronic health records from 1st January 2000. Individuals with at least two of the 31 Elixhauser Comorbidity Index conditions were identified as having multimorbidity. Market basket analysis was used to cluster the conditions for the whole population and then repeatedly stratified by age, sex and deprivation. 318,235 individuals were included in the analysis, with 67,728 (21·3%) having multimorbidity. We identified five distinct clusters of conditions in the population with multimorbidity: alcohol misuse, cancer, obesity, renal failure, and heart failure. Clusters of long-term conditions differed by age, sex and socioeconomic deprivation, with some clusters not present for specific strata and others including additional conditions. These findings highlight the importance of considering demographic factors during both clustering analysis and intervention planning for individuals with multiple long-term conditions. By taking these factors into account, the healthcare system may be better equipped to develop tailored interventions that address the needs of complex patients.

Background

Multimorbidity, also known as multiple long-term conditions, is the co-existence of two or more long-term conditions within an individual [1]. It is now the norm in ageing populations, with this group of patients being inherently heterogeneous [2, 3]. The estimated prevalence of multimorbidity varies considerably depending on the population studied, the specific list conditions that are included in the analysis and the data sources used., the [4], but consistent findings show that multimorbidity is common, more frequent in older people, women, and socioeconomically deprived populations [5, 6] The relationship between socioeconomic deprivation and multimorbidity is complex [7, 8], but there is evidence that the less affluent have earlier onset and more rapid accumulation of conditions resulting in widening inequalities into old age [9].

While most clinical guidelines focus on managing individual conditions, the number of individuals with multimorbidity is increasing, causing major concerns for the delivery of care in an already constrained healthcare system with competing needs [10, 11]. Prioritising interventions for high-risk groups is vital as healthcare systems strive toward the sustainability of service delivery. There is considerable evidence suggesting that the current disease-based approach to managing individuals with multimorbidity is associated with a variety of poor outcomes, including inadequate preventative care and access to rehabilitation services [12], repeated referrals for specialist care [13] and increased healthcare costs [14].

Understanding how conditions cluster is a key element in unravelling determinants and the delivery of future healthcare. Little is known about how disease clusters contribute to multimorbidity and complex multimorbidity (defined as having 4 or more multiple long-term conditions) [3] across age, sex, and socioeconomic deprivation of individuals [15]. Most studies that report clusters of conditions do so without considering the demographic characteristics of the patients which could affect the nature of observed clusters [16, 17]. Some approaches to clustering within the multimorbidity literature, aim to classify patients based on their conditions and place them into similar groups while other approaches aim to identify groups of conditions which are present in individuals more frequently than expected.

Recent work by Kuan et al. showed variability in the most common single conditions across the life course and also by sex [18]. Other studies have reported that socioeconomic status also impacts the development of multimorbidity during different periods [19]. A deeper understanding of how the different demographic characteristics associated with and contributing to clusters of conditions in patients with multimorbidity is needed to help clinicians in the management of those patients and to prepare the health systems to provide adequate management for these complex patients. This study aims to assess the prevalence of multimorbidity and complex multimorbidity by age, sex and area-level socioeconomic deprivation. We also identified the most common condition among patients with multimorbidity, and key clusters of disease, stratified by age, sex and socioeconomic deprivation.

Methods

Study design and population

The population for this study were residents of Fife and Tayside, Scotland who were aged at least 25 years old on 1st January 2000 and alive on 31 December 2018, when a cross-sectional analysis of all live patients was performed. The data was generated for a study exploring multimorbidity across different countries in the UK. Exploration of the data showed a strong socioeconomic and age gradient in terms of specific individual conditions and prevalence multimorbidity [20] which we felt warranted further exploration [20]. We ascertained the dates of death from National Record Scotland death certificates and the population register. The dataset used linked pseudonymised health and demographic data held by the Health Informatics Centre (HIC) at the University of Dundee.

Health care in Scotland is provided free at the point of care by the taxpayer-funded National Health Service (NHS). NHS Tayside and Fife are two separate Health Boards that provide specialist and secondary care services and contract with general practices that provide primary medical care to an approximate population of 800,000 individuals. The unique Community Health Index (CHI) number allocated to individuals at the point of registration with a general practice (GP), is used across the NHS. Demographic information, hospital admission and day-case records, cancer registry and mental health inpatient were linked to the death data from 1st January 2000 and Emergency Department (ED) attendances from 1st January 2017 onwards.

Multimorbidity definitions

All hospital admissions, psychiatric hospital admissions, outpatients, cancer registry and emergency department records over the period were examined, and all the International Classification of Diseases (ICD)-10 codes were extracted. We identified 31 conditions listed in the Elixhauser Comorbidity Index [21] (see S1 Table) based on the presence of ICD-10 codes relevant to individual conditions for the presence of the condition and the first record of diagnosis date during the study period. Depression and psychoses are examples of mental health long-term conditions included within the Elixhauser index whilst weight loss and cancer are some of the physical conditions listed (see S1 Table for a full list of conditions and related codes). The Elixhauser Index was chosen as previous reviews have suggested it is well established for use with electronic health records using ICD10 codes [22]. All individuals with two or more conditions were defined as having multimorbidity, and those with four or more were identified as having complex multimorbidity.

Explanatory variables

The patient’s age was calculated on 31st December 2018 and grouped as 44–49, 50–59, 60–69, 70–79, and 80+ years. Sex was recorded as male/female, and socioeconomic status was measured by the quintiles of the Scottish Index of Multiple Deprivation (SIMD), a postcode-assigned measure of small area (data zone) deprivation. SIMD uses seven domains (income, employment, health, education, housing, crime, access to services) to score data zones in different aspects of deprivation and is then ranked and grouped into quintiles [23].

Data management and statistical analysis

Frequencies and proportions of individuals with the conditions and the prevalence of multimorbidity and complex multimorbidity within each stratum (age, sex, deprivation) were reported and relevant tests for differences were used. Analysis of clusters was performed in three phases: (i) the whole population with multimorbidity, (ii) by age, sex and socioeconomic deprivation, and (iii) by their interactions. The rate of co-occurrence between pairs of the long-term conditions is presented in S1 Fig. To allow for the clustering of conditions across the characteristics of the population, the market basket analysis (MBA—also known as association rule mining) using the Apriori algorithm was used [24, 25]. The dissimilarity command with the Jaccard option within the MBA was used to identify clusters among the conditions stratified by age, sex and SIMD [24, 25]. A cluster is a group of conditions with shorter distances to themselves than to other conditions in a binary matrix of conditions. The dissimilarity function organises the conditions by cluster so that the conditions within clusters are closer together than those in different clusters, and therefore more likely to co-occur. We used MBA because it has been reported as more efficient for binary (present/absent) outcomes than the hierarchical cluster analysis that was originally built for quantitative outcomes [24, 25]. It also allows an individual to "belong" to more than one cluster if they have a large number of different conditions. The method computes and returns distances for binary data in transactions which can be used for grouping and clustering [24]. The optimal number of clusters from the dissimilarity clustering was determined using the Elbow method [26]. For clustering, we considered only conditions that had at least 5% prevalence in the population with multimorbidity (see the S2 Table). The clusters are summarised in Tables 2 and 3 and S3S5 Tables, and the generated dendrograms are presented in S2S4 Figs. R and Stata version 17 were used for the analysis.

Table 2. Multimorbidity clusters of the conditions among the whole population with multimorbidity, sex and deprivation subgroups.

Cluster+ Whole population Males Females Most deprived Least deprived
1 Alcohol misuse Cluster Alcohol Abuse
Other Neurological Disorders
Depression
Alcohol abuse
Other Neurological Disorders
Depression
Liver Disease
Alcohol Abuse
Other Neurological Disorders Depression
Alcohol abuse
Other Neurological Disorders
Depression
Liver Disease
Drug Abuse
Alcohol abuse
Other Neurological Disorders Depression
2 Cancer Cluster Solid Tumour w/o Metastasis
Metastatic Cancer
Solid Tumour w/o Metastasis Metastatic Cancer Solid Tumour w/o Metastasis
Metastatic Cancer
Solid Tumour w/o Metastasis
Metastatic Cancer
Solid Tumour w/o Metastasis
Metastatic Cancer
3 Obesity Cluster Obesity
Chronic Pulmonary Disease
Uncomplicated Hypertension
Uncomplicated Diabetes Rheumatoid Arthritis/Collagen
Hypothyroidism
Obesity
Chronic Pulmonary Disease
Uncomplicated Hypertension
Uncomplicated Diabetes Rheumatoid Arthritis/Collagen
Obesity
Chronic Pulmonary Disease Uncomplicated Hypertension Uncomplicated Diabetes Rheumatoid Arthritis/Collagen Hypothyroidism
Obesity
Chronic Pulmonary Disease Uncomplicated Hypertension Uncomplicated Diabetes
Obesity
Chronic Pulmonary Disease Cardiac Arrhythmia Uncomplicated Hypertension Uncomplicated Diabetes Rheumatoid Arthritis/Collagen Hypothyroidism
4 Renal Failure Cluster Peripheral Vascular Disorders
Renal Failure
Fluid & Electrolyte Disorders Deficiency Anaemia
Peripheral Vascular Disorders
Renal Failure
Fluid & Electrolyte Disorders
Peripheral Vascular Disorders
Renal Failure
Fluid & Electrolyte Disorders Deficiency Anaemia
Peripheral Vascular Disorders
Renal Failure
Fluid & Electrolyte Disorders Deficiency Anaemia
Hypothyroidism
Rheumatoid Arthritis/Collagen
Peripheral Vascular Disorders
Renal Failure
Fluid & Electrolyte Disorders Deficiency Anaemia
5 Heart Failure Cluster Valvular Disease
Congestive Heart Failure
Cardiac Arrhythmia Pulmonary Circulation Disorders
Valvular Disease
Congestive Heart Failure
Cardiac Arrhythmia
Valvular Disease
Congestive Heart Failure
Cardiac Arrhythmia
Pulmonary Circulation Disorders
Valvular Disease
Congestive Heart Failure Cardiac Arrhythmia
Pulmonary Circulation Disorders
Valvular Disease
Congestive Heart Failure
Pulmonary Circulation Disorders

*only conditions with at least 5% prevalence within each specific population subgroup were clustered +An individual can "belong" to more than one cluster.

Table 3. Multimorbidity clusters of the conditions by age*,^@.

Clusters*,^@+ Whole population 44–49 years 50–59 years 60–69 years 70–79 years 80+ years
Alcohol misuse cluster Alcohol Abuse
Other Neurological Disorders
Depression
Alcohol abuse
Other Neurological Disorders Depression
Psychoses
Drug Abuse
Liver Disease
Alcohol abuse
Depression
Psychoses
Drug Abuse
Alcohol abuse
Other Neurological Disorders Depression
Liver Disease
Alcohol abuse
Other Neurological Disorders Depression
Cancer cluster Solid Tumour w/o Metastasis
Metastatic Cancer
Solid Tumour w/o Metastasis
Fluid & Electrolyte Disorders Cardiac Arrhythmia
Rheumatoid Arthritis/Collagen Deficiency Anaemia Hypothyroidism
Solid Tumour w/o Metastasis
Metastatic Cancer
Solid Tumour w/o Metastasis
Metastatic Cancer
Solid Tumour w/o Metastasis
Metastatic Cancer
Solid Tumour w/o Metastasis
Uncomplicated Hypertension Uncomplicated Diabetes
Chronic Pulmonary Disease Cardiac Arrhythmia
Renal Failure
Obesity cluster Obesity
Chronic Pulmonary Disease
Uncomplicated Hypertension
Uncomplicated Diabetes Rheumatoid Arthritis/Collagen
Hypothyroidism
Obesity
Uncomplicated Hypertension
Uncomplicated Diabetes
Obesity
Chronic Pulmonary Disease
Uncomplicated Hypertension Uncomplicated Diabetes
Rheumatoid Arthritis/Collagen Hypothyroidism
Obesity
Chronic Pulmonary Disease Uncomplicated Hypertension Uncomplicated Diabetes
Rheumatoid Arthritis/Collagen Hypothyroidism
Chronic Pulmonary Disease Uncomplicated Hypertension Uncomplicated Diabetes
Rheumatoid Arthritis/Collagen
Renal Failure cluster Peripheral Vascular Disorders
Renal Failure
Fluid & Electrolyte Disorders Deficiency Anaemia
Fluid & Electrolyte Disorders Deficiency Anaemia
Liver Disease
Other Neurological Disorders
Peripheral Vascular Disorders Renal Failure
Fluid & Electrolyte Disorders Deficiency Anaemia
Peripheral Vascular Disorders Renal Failure
Fluid & Electrolyte Disorders Deficiency Anaemia
Peripheral Vascular Disorders Pulmonary Circulation Disorders
Congestive Heart Failure
Valvular Disease
Heart Failure cluster Valvular Disease
Congestive Heart Failure
Cardiac Arrhythmia
Pulmonary Circulation Disorders
Valvular Disease
Congestive Heart Failure
Cardiac Arrhythmia
Valvular Disease
Congestive Heart Failure Cardiac Arrhythmia
Valvular Disease
Congestive Heart Failure
Cardiac Arrhythmia
others Obesity
Hypothyroidism Pulmonary Circulation Disorders
Metastatic Cancer
Other Neurological Disorders Fluid & Electrolyte Disorders Hypothyroidism
Deficiency Anaemia
Rheumatoid Arthritis/Collagen

*only conditions with at least 5% prevalence within each specific population subgroup were clustered ^efforts were made to align clusters that were similar across different populations. @Blank cells exist for certain age where the identified conditions are not present or clustered. +An individual can "belong" to more than one cluster.

Ethical approval

HIC provided a linked dataset within a Safe Haven environment for this study. The dataset was obtained under HIC Standard Operating Procedures (SOP). NHS Tayside Research Ethics Committee have approved these SOPs (18/ES/0126). The School of Medicine Ethics Committee, acting on behalf of the University of St Andrews Teaching and Research Ethics Committee approved the project (UTREC MD15619 approved 30th June 2021). As the study data are de-identified, consent from individual patients was not required.

Results

Overall, 318,235 people aged 44 years and over were included in the analysis, with 67,728(21·3%) identified as having multimorbidity (2+ conditions), while 20,123(6·3%) were also classed as having complex multimorbidity (4+ conditions). The mean (SD) age of the people with multimorbidity was 72·8(7·1) years, 31439(46·4%) were men, 13955(20·6%) were most deprived and 12268(18·1%) were least deprived. The prevalence of both multimorbidity and complex multimorbidity in the whole population was similar for both sexes and increased significantly with age (Fig 1) and with increasing socioeconomic deprivation. more women in younger age groups (44–59) have multimorbidity compared to men, whereas in individuals aged 60 and above, men have a higher prevalence of multimorbidity (Table 1).

Fig 1. Prevalence of multimorbidity by sex and age.

Fig 1

Table 1. Distribution of population by background characteristics and level of multimorbidity.

Characteristics All individuals Distribution of multimorbidity patients (n, %) Prevalence of multimorbidity (2+) (n, %) Prevalence of complex multimorbidity (4+) (n, %)
All 318,235 67728 (100) 67,728 (21·3) 20,123 (6·3)
Sex
F 169,086 36289 (53·6) 36,289 (21·5) 10,711 (6·3)
M 149,149 31439 (46·4) 31,439 (21·1) 9,412 (6·3)
Age (years) Mean(SD) 64.6(6.9) 72·8(7·1) 75.3(6.8)
44–49 40,023 3232 (4·8) 3,232 (8·1) 595 (1·5)
50–59 90,719 10226 (15·1) 10,226 (11·3) 2,088 (2·3)
60–69 80,486 14827 (21·9) 14,827 (18·4) 3,687 (4·6)
70–79 66,379 19924 (29·4) 19,924 (30·0) 6,098 (9·2)
80+ 40,628 19519 (28·8) 19,519 (48·0) 7,655 (18·8)
Deprivation quintile
Most deprived 1 52,958 13955 (20·6) 13,955 (26·4) 4,449 (8·4)
2 58,468 14242 (21·1) 14,242 (24·4) 4,458 (7·6)
3 62,535 13105 (19·3) 13,105 (21·0) 3,879 (6·2)
4 64,568 12271 (18·1) 12,271 (19·0) 3,485 (5·4)
Least deprived 5 69,416 12268 (18·1) 12,268 (17·7) 3,259 (4·7)
Missing 10,290 1887 (2·8) 1,887 (18·3) 593 (5·8)
Women /Age (years)
44–49 20,498 1836 (5·1) 1,836 (9·0) 324 (1·6)
50–59 46,849 5723 (15·8) 5,723 (12·2) 1,137 (2·4)
60–69 41,586 7440 (20·4) 7,440 (17·9) 1,829 (4·4)
70–79 35,497 9823 (27·1) 9,823 (27·7) 2,930 (8·3)
80+ 24,656 11467 (31·6) 11,467 (46·5) 4,491 (18·2)
Men/Age (years)
44–49 19,525 1396 (4·4) 1,396 (7·1) 271 (1·4)
50–59 43,870 4503 (14·3) 4,503 (10·3) 951 (2·2)
60–69 38,900 7387 (23·5) 7,387 (19·0) 1,858 (4·8)
70–79 30,882 10101 (32·2) 10,101 (32·7) 3,168 (10·3)
80+ 15,972 8052 (25·6) 8,052 (50·4) 3,164 (19·8)
Women/Deprivation
Most deprived 27,874 7516 (20·7) 7,516 (27·0) 2,396 (8·6)
2 31,132 7777 (21·4) 7,777 (25·0) 2,445 (7·9)
3 33,201 7040 (19·4) 7,040 (21·2) 2,055 (6·2)
4 34,341 6461 (17·8) 6,461 (18·8) 1,802 (5·2)
Least deprived 37,128 6413 (17·7) 6,413 (17·3) 1,664 (4·5)
Missing 5,410 1082 (3·0) 1,082 (20·0) 349 (6·5)
Men/Deprivation
Most deprived 25,084 6439 (20·4) 6,439 (25·7) 2,053 (8·2)
2 27,336 6465 (20·6) 6,465 (23·7) 2,013 (7·4)
3 29,334 6065 (19·3) 6,065 (20·7) 1,824 (6·2)
4 30,227 5810 (18·5) 5,810 (19·2) 1,683 (5·6)
Least deprived 32,288 5855 (18·6) 5,855 (18·1) 1,595 (4·9)
Missing 4,880 805 (2·6) 805 (16·5) 244 (5·0)

The most common condition among all patients with multimorbidity was uncomplicated hypertension (53%) followed by chronic pulmonary disease (27%) while the least common was AIDS/HIV (0·1%) (see S2 Table). The most frequent conditions among 44-49-year-olds with multimorbidity were chronic pulmonary disease (29%), alcohol abuse (28%), and depression (28%), compared with uncomplicated hypertension (65%), cardiac arrhythmia (37%) and solid tumour without metastasis (34%) among those aged 80+ years. Uncomplicated hypertension is the leading condition among both the most deprived (50%) and least deprived (55%) patients with multimorbidity (see S3 Table). The rate of co-occurrence of pairs of conditions per 1000 patients with multimorbidity is shown in S1 Fig. The commonest pairs of conditions were uncomplicated hypertension and uncomplicated diabetes (32·5 per 1000 patients with multimorbidity), uncomplicated hypertension and cardiac arrhythmia (30·0/1000), chronic pulmonary disease and uncomplicated hypertension (28·1/1000) and uncomplicated hypertension / solid tumour without metastasis (26·6/1000).

Clustering of conditions among people with multimorbidity

All people with multimorbidity

In the total population with multimorbidity, conditions were grouped into five clusters (Table 2).

Cluster 1: Alcohol misuse ClusterIn cluster 1, twenty-six percent (n = 17366) of the people with multimorbidity had at least two of alcohol abuse, other neurological disorders, and depression., The mean age of people in cluster 1 was 66·4 (standard deviation = 12·5) years, 8891(51%) of them were women, 4098(24%) and 2049(12%) from the most deprived and least deprived groups respectively (S4A Table).

Cluster 2: Cancer Cluster

10766(56%) of the 19,123 people with at least two solid tumours without metastasis, and metastatic cancer were women with 2945(16%) and 3748(19%) from the most deprived and least deprived groups respectively (S4A Table).

Cluster 3: Obesity Cluster

Obesity, uncomplicated hypertension, chronic pulmonary disease, rheumatoid arthritis/collagen disorders, hypothyroidism, and uncomplicated diabetes formed cluster 3, 55105(81·4%) of the people with multimorbidity have at least two of the conditions in cluster 3.

Cluster 4: Renal Failure Cluster

The conditions in cluster 4 are peripheral vascular disorders, renal failure, fluid & electrolyte disorders, and deficiency anaemia.

Cluster 5: Heart Failure Cluster

The conditions that formed cluster 5 are pulmonary circulation disorders, valvular disease, congestive heart failure and cardiac arrhythmias.

In all, the percentages of people with multimorbidity from the most deprived groups were higher than the people from the least deprived group except for cluster 2. The clusters of conditions identified for strata of sex, age and socio-economic deprivation are presented in (Table 3) and in S4, S5 and S6 Tables and S2, S3 and S4 Figs.

Looking at stratification by sex and social deprivation, the identified clusters had a core set of conditions across strata. “The core conditions included alcohol misuse, other neurological disorders and depression in the alcohol misuse cluster and solid tumour without metastasis and metastatic cancer in the cancer cluster”. However, some clusters for specific strata also have additional conditions within the clusters. For instance, most deprived people had additional conditions such as drug abuse in the alcohol misuse cluster (See Table 2). There are similarities in the number of clusters formed among these conditions across sex and deprivation quintiles. Identifying clusters for the different age groups, conditions in cluster 1 among the youngest (44–49 years) and those in their 50s are similar while cluster 1 in the 60s and 70s are similar but such cluster did not exist in those aged 80 years or older. Cluster 2 in the youngest group (44–49 years), and Cluster 3 in the 50s and 60s look similar while there were additional conditions as the patients grew older (Table 3).

For the most deprived populations aged 80 years and over, drug abuse, alcohol abuse, psychosis, and depression formed a cluster which affected 1 in 5 people with multimorbidity in this subgroup, but these conditions were not prominent among the older least deprived population. This would suggest that those planning initiatives aimed at different populations of people with multimorbidity should be aware that underlying clusters of disease will be different. Alcohol and drug abuse formed part of a cluster (liver disease, psychosis, alcohol abuse, drug, depression, chronic pulmonary disease and other neurological disorders) among 83% of the most deprived patients with multimorbidity aged 44–49 years. However, they contributed to a smaller cluster (psychosis, alcohol abuse, drug, depression) among only two-fifths of their least deprived counterparts.

About 26% of the patients with multimorbidity have at least two of the conditions in the alcohol misuse cluster, with a mean (standard deviation) age of 66.4(12.5) years, among which 51% were females, 24% and 12% from most deprived and least deprived groups respectively (Table 4). Fifty-six percent of the people with the conditions in the cancer cluster were females with 15% and 20% from the most deprived and least deprived groups respectively. About 8 of every 10 patients with multimorbidity have at least two of the conditions in the obesity cluster. The percentages of patients with multimorbidity from the most deprived groups were higher than the people from the least deprived group across all the clusters except for the cancer cluster.

Table 4. Multimorbidity clusters of the conditions among the whole population with multimorbidity.
Population subgroup Cluster Conditions* No of people in cluster n(%) Mean age (std dev) Women % % in Most Deprived % in the least Deprived
Whole Population N = 67728 Alcohol misuse cluster Alcohol Abuse
Other Neurological Disorders
Depression
17366 (25·6) 66·4 (12·5) 8891 (51·2) 4098 (23·6) 2049 (11·8)
Cancer cluster Solid Tumour w/o Metastasis
Metastatic Cancer
19123 (28·2) 74·7 (11·1) 10766 (56·3) 2945 (15·4) 3748 (19·6)
Obesity cluster Obesity
Chronic Pulmonary Disease
Uncomplicated Hypertension
Uncomplicated Diabetes Rheumatoid Arthritis/Collagen
Hypothyroidism
55105 (81·4) 72·6 (12) 29757 (54) 10250 (18·6) 8872 (16·1)
Renal Failure cluster Peripheral Vascular Disorders
Renal Failure
Fluid & Electrolyte Disorders Deficiency Anaemia
20771 (30·7) 75·8 (11·9) 11736 (56·5) 3884 (18·7) 3199 (15·4)
Heart Failure cluster Valvular Disease
Congestive Heart Failure
Cardiac Arrhythmia Pulmonary Circulation Disorders
23497 (34·7) 75·7 (12) 11020 (46·9) 3854 (16·4) 4229 (18·0)

Discussion

In this population, where the presence of long-term conditions was ascertained using secondary care data, the prevalence of multimorbidity and complex multimorbidity was similar for both men and women and increased with age and socioeconomic deprivation. Several previous studies have identified clusters that include cardiovascular-metabolic conditions, mental health issues, and musculoskeletal disorders [27]. A recent study has also demonstrated variations in clusters by age, as well as differences in mortality and service utilisation [28]. In addition, our study reported how long-term conditions cluster differently based on age, sex, and socioeconomic deprivation.

The differences between the clusters of conditions by social deprivation and age support that the overall population with multimorbidity are essentially heterogeneous groups of patients with different conditions and hence different needs.

Clustering populations that were stratified by age and deprivation showed differences in people aged over 80 years, although there seemed less variability in people aged 44–49 years. These findings highlight that although the relationship between levels of deprivation age and multimorbidity is well known [19, 29], there is much less known about differences in clusters of conditions for these characteristics [19].

The identified clusters strongly correspond to current medical knowledge, demonstrating well-known associations between conditions such as alcohol abuse and depression. Across the population, hypertension and cardiac arrhythmias were the study population’s most prevalent pair of conditions, which supports the known relationship between hypertension and heart diseases [30]. Our analysis shows association, not causality but it may be possible to surmise the drivers of specific clusters as identified. Conditions most prevalent in our most deprived population groups include alcohol and drug misuse, depression and obesity which are all known to be associated with social factors. Other identified clusters are likely to have more physiological drivers e.g. hypertension through to heart diseases.

The choice of how to define multimorbidity is important in terms of conditions and risk factors. Obesity and hypertension can be considered as both conditions that require management and as risk factors contributing to the development of other health problems. Our findings suggest a high prevalence of obesity among individuals aged 44–49 years old with multimorbidity. This undoubtedly places a significant burden on both health and social care services, given the available evidence on how obesity can reduce life expectancy and healthy life expectancy [31]. The younger age group also had alcohol misuse as a key condition. This supports a recent report on alcohol-related harm with risk factors rooted foremost in socioeconomic determinants [32].

Study strengths and limitations

The study population was drawn from two Scottish Health Boards with comprehensive health records over a long period. The use of well-defined conditions and ICD-10 codesets to identify each condition allows other researchers to explore multimorbidity using the same methods. Using market basket analysis to cluster conditions rather than classifying patients into mutually exclusive groups meant patients could be present in more than one cluster depending on the conditions they had Using the same approach across different strata meant comparisons were down to the underlying data rather than simply different populations using different methods.

The data used to identify conditions were hospital records from secondary care These will underestimate the occurrence of conditions as less severe cases might not have been captured. Some limitations of this work relate to the choice of the Elixhauser Index to identify underlying conditions. Some common conditions such as myocardial infarction (ICD10 code I21) are not among the 31 conditions identified in the Elixhauser Index and a few individual conditions may be a progression of a single condition, such as uncomplicated diabetes to diabetes with chronic complications. However, people with this progression of diseases also have other conditions and the clustering will be unaffected as only conditions with more than 5% prevalence were clustered. Each condition has a mutually exclusive code set meaning that different ICD10 codes are related only to one condition. We made an a priori decision to only study the 31 conditions listed and to treat them all as separate. Recent work from Ho et al. has suggested a more complete list of underlying individual conditions which may change the identified clusters [4] but is unlikely to change the fact that clusters will vary in different age groups, gender or socioeconomic groups.

Similarly, there have been several different methods used to identify clusters within multimorbid populations but our choice of Market Basket Analysis as a methodological tool is unlikely to be the cause of differences when examining strata. However, the clusters generated by market basket analysis are based on empirical patterns in the data and only show associations between different conditions, it does not show any causal relationships between those conditions. We reported on clusters of multimorbidity in people alive on 31st December 2018, if we had included those who had died throughout the period, we may have seen some differences in the identified clusters. The measure of deprivation used in this study is allocated at a postcode level, but it is a small area approximation rather than a direct measure of individual deprivation.

The naming of the clusters was discussed with the research team with either the most common condition or a representative term used but it is still subjective labelling. Cluster 1 for instance was named as alcohol misuse as this was the most common condition for people identified in the cluster but it could also have been labelled as socioeconomic-driven conditions.

Recommendations

Identification of patients who are most vulnerable based on clustering of conditions across characteristics such as age, sex and level of deprivation should be used to inform public health strategies including direct primary prevention and interventional clinical services to where they are most needed. There is a need for significant investment in preventative and public health measures and to take action on social determinants of health [33]. The clusters of conditions identified in this study may suggest lifestyle interventions, support groups and mental health interventions in the most deprived areas would be a good strategy to focus on. If not, gaps in health inequalities and differences in multimorbidity prevalence observed may very well continue to widen.

Conclusions

This paper identified that different sub-population groups with multimorbidity need different interventions to prevent and/or manage multimorbidity. Condition clustering in the multimorbid population is mainly influenced by age and also by sex and area-level socioeconomic deprivation. A third of the youngest age group with multimorbidity have alcohol misuse contributing to their multimorbidity. Almost half of the oldest age group have hypertension and cardiac arrhythmia. When considering the clustering of conditions, it is important to consider the age of the people being studied as well as their sex and level of socio-economic deprivation.

Supporting information

S1 Fig. Rate of co-occurrence of pairs of conditions per 1000 people with multimorbidity.

(TIF)

S2 Fig. Clustering of Conditions among multimorbid (2+ conditions) patients by sex and level of deprivation (excludes conditions with less than 5% prevalence).

(TIF)

S3 Fig. Clustering of Conditions among multimorbid (2+ conditions) patients by age groups (only include conditions with 5%+ prevalence).

(TIF)

S4 Fig. Clustering of Conditions among multimorbid (2+ conditions) patients by age and deprivation (excludes conditions with less than 5% prevalence).

(TIF)

S1 Table. List of Elixhauser Index conditions, abbreviations, ICD10 codes.

(PDF)

S2 Table. Prevalence of the conditions among all patient, patients with multimorbidity and complex multimorbidity.

(PDF)

S3 Table. Prevalence of the conditions by characteristics of people with multimorbidity.

(PDF)

S4 Table. Multimorbidity Clusters of the Conditions among the whole multimorbid population, sex and deprivation subgroups.

(PDF)

S5 Table. Multimorbidity Clusters of the Conditions across age subgroups.

(PDF)

S6 Table. Multimorbidity Clusters of the Conditions across age-deprivation subgroups.

(PDF)

Acknowledgments

We acknowledge the support of the Health Informatics Centre, University of Dundee for managing and supplying the anonymised data and NHS Tayside and Fife for the original data source. This work uses data provided by patients and collected by the NHS as part of their care and support.

Data Availability

A data dictionary covering the data sources used in this study and the analysis codes are deposited at https://github.com/fadeniyi123/MM_HDRUK. The data used in this study are sensitive and are not publicly available. Access to the data is by application to the Health Informatics Centre, University of Dundee, Scotland (hicsupport@dundee.ac.uk) using their standard governance and access processes.

Funding Statement

CMC: This work was supported by Health Data Research UK (HDR UK) Measuring and Understanding Multimorbidity using Routine Data in the UK (HDR-9006; CFC0110). Health Data Research UK (HDR-9006) is funded by: UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, the National Institute for Health Research (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation, and Wellcome Trust. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. NO

References

  • 1.Johnston MC, Crilly M, Black C, Prescott GJ, Mercer SW. Defining and measuring multimorbidity: a systematic review of systematic reviews. Eur J Public Health. 2019;29:182–9. doi: 10.1093/eurpub/cky098 [DOI] [PubMed] [Google Scholar]
  • 2.Moffat K, Mercer SW. Challenges of managing people with multimorbidity in today’s healthcare systems. BMC Fam Pract. 2015. 10.1186/s12875-015-0344-4. doi: 10.1186/s12875-015-0344-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Agrawal U, Azcoaga-Lorenzo A, Fagbamigbe AF, Vasileiou E, Henery P, Simpson CR, et al. Association between multimorbidity and mortality in a cohort of patients admitted to hospital with COVID-19 in Scotland: J Royal Society of Medicine. 2022;115:22–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ho ISS, Azcoaga-Lorenzo A, Akbari A, Davies J, Hodgins P, Khunti K, et al. Variation in the estimated prevalence of multimorbidity: systematic review and meta-analysis of 193 international studies. BMJ Open. 2022;12:e057017. doi: 10.1136/bmjopen-2021-057017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Probst C, Kilian C, Sanchez S, Lange S, Rehm J. The role of alcohol use and drinking patterns in socioeconomic inequalities in mortality: a systematic review. Lancet Public Health. 2020;5:e324–32. doi: 10.1016/S2468-2667(20)30052-9 [DOI] [PubMed] [Google Scholar]
  • 6.Collins SE. Associations Between Socioeconomic Factors and Alcohol Outcomes. Alcohol Res. 2016;38:83. [PMC free article] [PubMed] [Google Scholar]
  • 7.Cassell A, Edwards D, Harshfield A, Rhodes K, Brimicombe J, Payne R, et al. The epidemiology of multimorbidity in primary care: a retrospective cohort study. British Journal of General Practice. 2018;68:e245–51. doi: 10.3399/bjgp18X695465 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Aziz F, Cardoso VR, Bravo-Merodio L, Russ D, Pendleton SC, Williams JA, et al. Multimorbidity prediction using link prediction. Sci Rep. 2021;11. doi: 10.1038/s41598-021-95802-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chan MS, Van Den Hout A, Pujades-Rodriguez M, Jones MM, Matthews FE, Jagger C, et al. Socio-economic inequalities in life expectancy of older adults with and without multimorbidity: a record linkage study of 1.1 million people in England. Int J Epidemiol. 2019;48:1340–51. doi: 10.1093/ije/dyz052 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Boyd CM, Darer J, Boult C, Fried LP, Boult L, Wu AW. Clinical practice guidelines and quality of care for older patients with multiple comorbid diseases: implications for pay for performance. JAMA. 2005;294:716–24. doi: 10.1001/jama.294.6.716 [DOI] [PubMed] [Google Scholar]
  • 11.Hughes LD, McMurdo MET, Guthrie B. Guidelines for people not for diseases: the challenges of applying UK clinical guidelines to people with multimorbidity. Age Ageing. 2013;42:62–9. doi: 10.1093/ageing/afs100 [DOI] [PubMed] [Google Scholar]
  • 12.Palladino R, Lee JT, Ashworth M, Triassi M, Millett C. Associations between multimorbidity, healthcare utilisation and health status: evidence from 16 European countries. Age Ageing. 2016;45:431–5. doi: 10.1093/ageing/afw044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Starfield B, Lemke KW, Herbert R, Pavlovich WD, Anderson G. Comorbidity and the Use of Primary Care and Specialist Care in the Elderly. Ann Fam Med. 2005;3:215. doi: 10.1370/afm.307 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yoon J, Zulman D, Scott JY, Maciejewski ML. Costs associated with multimorbidity among VA patients. Med Care. 2014;52 Suppl 3 Suppl 3. doi: 10.1097/MLR.0000000000000061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bisquera A, Gulliford M, Dodhia H, Ledwaba-Chapman L, Durbaba S, Soley-Bori M, et al. Identifying longitudinal clusters of multimorbidity in an urban setting: A population-based cross-sectional study. 2021. 10.1016/j.lanepe.2021.100047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Launders N, Hayes JF, Price G, Osborn DPJ. Clustering of physical health multimorbidity in people with severe mental illness: An accumulated prevalence analysis of United Kingdom primary care data. PLoS Med. 2022;19:e1003976. doi: 10.1371/journal.pmed.1003976 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Soley-Bori M, Bisquera A, Ashworth M, Wang Y, Durbaba S, Dodhia H, et al. Identifying multimorbidity clusters with the highest primary care use: 15 years of evidence from a multi-ethnic metropolitan population. British Journal of General Practice. 2022;72:e190–8. doi: 10.3399/BJGP.2021.0325 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kuan V, Denaxas S, Gonzalez-Izquierdo A, Direk K, Bhatti O, Husain S, et al. A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service. Lancet Digit Health. 2019;1:e63–77. doi: 10.1016/S2589-7500(19)30012-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.McLean G, Gunn J, Wyke S, Guthrie B, Watt GCM, Blane DN, et al. The influence of socioeconomic deprivation on multimorbidity at different ages: a cross-sectional study. The British Journal of General Practice. 2014;64:e440. doi: 10.3399/bjgp14X680545 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cezard G, Sullivan F, Keenan K. Understanding multimorbidity trajectories in Scotland using sequence analysis. Scientific Reports 2022 12:1. 2022;12:1–15. doi: 10.1038/s41598-022-20546-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Van Walraven C, Austin PC, Jennings A, Quan H, Forster AJ. A modification of the Elixhauser comorbidity measures into a point system for hospital death using administrative data. Med Care. 2009;47:626–33. doi: 10.1097/MLR.0b013e31819432e5 [DOI] [PubMed] [Google Scholar]
  • 22.Yurkovich M, Avina-Zubieta JA, Thomas J, Gorenchtein M, Lacaille D. A systematic review identifies valid comorbidity indices derived from administrative health data. J Clin Epidemiol. 2015;68:3–14. doi: 10.1016/j.jclinepi.2014.09.010 [DOI] [PubMed] [Google Scholar]
  • 23.Scottish Government. Scottish Index of Multiple Deprivation 2020—gov.scot. 2020;:1. https://www.gov.scot/collections/scottish-index-of-multiple-deprivation-2020/. Accessed 7 Jul 2022. [Google Scholar]
  • 24.Hahsler M, Grün B, Hornik K. Arules—A computational environment for mining association rules and frequent item sets. J Stat Softw. 2005;14:1–25. [Google Scholar]
  • 25.Hahsler M, Karpienko R. Visualizing association rules in hierarchical groups. J Bus Econ. 2017;87:317–35. [Google Scholar]
  • 26.Thorndike RL. Who belongs in the family? Psychometrika 1953 18:4. 1953;18:267–76. [Google Scholar]
  • 27.Prados-Torres A, Calderón-Larrañaga A, Hancco-Saavedra J, Poblador-Plou B, Van Den Akker M. Multimorbidity patterns: a systematic review. J Clin Epidemiol. 2014;67:254–66. doi: 10.1016/j.jclinepi.2013.09.021 [DOI] [PubMed] [Google Scholar]
  • 28.Zhu Y, Edwards D, Mant J, Payne RA, Kiddle S. Characteristics, service use and mortality of clusters of multimorbid patients in England: A population-based study. BMC Med. 2020;18:1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ingram E, Ledden S, Beardon S, Gomes M, Hogarth S, Mcdonald H, et al. Household and area-level social determinants of multimorbidity: a systematic review. J Epidemiol Community Health. 2021;75:232–41. doi: 10.1136/jech-2020-214691 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Berkin KE, Ball SG. Essential hypertension: the heart and hypertension. Heart. 2001;86:467–75. doi: 10.1136/heart.86.4.467 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Vidra N, Trias-Llimós S, Janssen F. Impact of obesity on life expectancy among different European countries: secondary analysis of population-level data over the 1975–2012 period. BMJ Open. 2019;9:e028086. doi: 10.1136/bmjopen-2018-028086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Scottish Government. Scottish Index of Multiple Deprivation (SIMD). 2021;:1–3. https://www.northlanarkshire.gov.uk/your-council/facts-and-figures/scottish-index-multiple-deprivation-simd-2020. Accessed 1 Jul 2022. [Google Scholar]
  • 33.Braveman P, Gottlieb L. The Social Determinants of Health: It’s Time to Consider the Causes of the Causes. Public Health Reports. 2014;129 Suppl 2:19. doi: 10.1177/00333549141291S206 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Mona Pathak

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

5 Apr 2023

PONE-D-23-03161Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in ScotlandPLOS ONE

Dear Dr. Fagbamigbe,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 20 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Mona Pathak, PhD

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We noticed you have some minor occurrence of overlapping text with the following previous publication(s), which needs to be addressed:

- https://www.researchgate.net/publication/325289591_Patterns_of_Multimorbidity_in_Middle-Aged_and_Older_Adults_An_Analysis_of_the_UK_Biobank_Data

In your revision ensure you cite all your sources (including your own works), and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed.

3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

Reviewer #3: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

This paper used 20 years of linked electronic health records to examine clusters of multimorbidity in Scotland across age groups, sex, and levels of deprivation. Looking at how conditions cluster together across different strata in society will potentially have implications for health service delivery. This paper contains important work but presents many complex findings which can make it difficult to interpret at times. Therefore, this paper requires restructuring and some considerable changes before it can be considered for publication.

Abstract

You say multimorbidity is present in 21.3% and then later on you say ‘multimorbidity was present in one of every 5 people’. This is repetition. Consider removing one sentence.

Please reconsider how you describe the multimorbidity clusters in the Abstract as it is confusing in its current form. I would recommend naming the five distinct clusters and then mention that they differ by age, sex, and deprivation without going into too much detail about this in the Abstract.

There are more incidences of repetition in the Abstract – please address this.

Background

The Background is well-structured and you build your argument well.

Please provide a definition of ‘complex multimorbidity’ in order to justify the aims of your study.

Further justification is required for using data from Scotland. For example, the strong socioeconomic gradient in Scotland might be one reason for focusing on this country (example in https://www.nature.com/articles/s41598-022-20546-4#ref-CR18)

Methods

Please clarify – if participants comprised all those who were alive on 31st December 2018, how did you use death records to measure multimorbidity?

It would be helpful to provide justification for the use of the Elixhauser Comorbidity Index to measure multimorbidity when there are more comprehensive measures available now, e.g. https://bmjmedicine.bmj.com/content/bmjmed/1/1/e000247.full.pdf

In the Outcome section, please provide some examples of physical and mental long-term conditions that were included in the multimorbidity definition.

Line 119 – frequencies and proportions of what? Numbers of conditions? This needs to be more clear.

The role of complex multimorbidity in this study is not clear from the Methods- particularly the Data Management and Statistical Analysis section. Please provide more clarity.

Could you include a sentence or two in the Data Management and Statistical Analysis section about Market Basket Analysis (MBA) i.e. how this clustering method allows for clustering across the characteristics of the study sample? I believe hierarchical cluster analysis can do the same thing so it would be good to further justify the use of MBA.

Results

In the Methods you state that the study was based on ‘cross-sectional analysis…when an individual was aged at least 25 years’ (line 90/91). However, the sample in the Results comprise people aged over 44 years. Some clarity is required here. Why over 44 years? Is this because there is a retrospective study period of 20 years?

Were those with complex multimorbidity included in the overall multimorbidity count? i.e. Did 67,728 people have MM or did 87,851 people have MM? Please provide clarity in the paper.

Table 1 – please provide proportions (%) for ‘All individuals’, e.g., for sex, age groups, etc. to allow comparison with the MM group. Please also provide a mean age for ‘All individuals’.

Figure 1 – I would recommend removing the percentages attached to the lines in the graph

In the Introduction, you talk about links between MM and obesity and harmful use of alcohol and drugs (lines 58-60). However, you include both obesity and alcohol abuse as long-term conditions in your multimorbidity measure. I would consider removing this from the Introduction as it introduces confusion.

It might be useful to assign more meaningful names to the clusters, seeing as they don’t differ massively between sex or deprivation levels. For example, cluster 5 could be called the cardiovascular cluster.

Lines 181 to 190 are confusing for the reader. Please consider presenting these results in a different way.

Line 191-192: Please provide some examples of the core set of conditions across strata

Table 2 is really helpful for the interpretation of Results. However, I would suggest reorganising the ‘Clustering’ section of the Results for more clarity. More subheadings would be useful. I would consider basing these on individual clusters.

Why was there no cluster analysis performed amongst those with complex multimorbidity?

Presenting clusters across age groups is difficult and I think Table 3 requires some further clarification. Perhaps include a footnote to indicate that efforts were made to align clusters that were similar. I would also explain why blank cells exist for certain age groups. Moreover, would there be a way of showing how common it is to for participants with MM to belong to one of these clusters in the Table?

Discussion

In the second paragraph, you state that 99% of those aged >80 years were included in cluster 1- please correct to Cluster 2. And correct ‘clusters 2 and 3’ to ‘4 and 6’ (judging by Table 3).

Parts of the Discussion feel like a Results section (but for the Supplementary results). Make sure to discuss results here, rather than just describe them.

Please add some strengths of the study

Are there any limitations relating to market basket analysis that need to be mentioned? Is this widely applied to healthcare data?

Are there any limitations around the measure of deprivation used?

General remarks:

Please check spelling and grammar throughout

Reviewer #2: Thank you for the opportunity to review this paper. An understanding of how LTCs cluster together for people in different social groups (by age, deprivation etc) is very much needed. I was looking for such a paper recently to understand socioeconomic inequalities in outcomes for people with MLTCs so I am pleased to see this work has been done. The paper is clear, well-written and nicely summarises a complex picture. My comments are minor.

Methods:

Please make it clearer when describing the method and also in the tables that a person can "belong" to more than one cluster.

What is the rationale for using Elixhauser conditions?

Results:

S4 Table. I think these important results should be included in the main text (or at least the whole population rows at the top of the table if there isn't room for all of this).

Would it be appropriate to calculate the prevalence of each cluster using the whole sample rather than the number with MLTCs as the denominator? That would be interesting to include in the main text.

S4 Table Least deprived population Solid tumour w/c metastasis cluster appears twice (presumably a labelling error).

S2 Fig. What does the y-axis Height refer to? Please add a footnote to the figure to explain this. Also the clusters are shown as branches and sub-branches. It would be helpful to have a brief explanation in a footnote on what this means for those of us who aren't familiar with the clustering method used.

Discussion:

As the authors point out, the specifics of the clusters identified will depend on which LTCs are counted and whether the data is based on hospital admissions or also primary care. However, it would be helpful to know to what extent the authors think the clusters they have identified are likely to be replicated in broad terms (eg do they make clinical sense, are the results from this local study likely to be generalisable in other places). Or is this more a proof of concept paper showing that we need to segment and not assume that one size fits all?

I think the paper currently makes the latter point well. But if there are some clusters that are likely generalisable then it would be good to expand the policy/practice implications (e.g. for the alcohol, depression, drug use cluster) .

Quite a lot of the material on page 11-12 looks like results rather than discussion. I'd suggest moving some of that up into the results section.

It's not helpful to refer to cluster numbers without a description of the conditions in the discussion.

Some of the combinations have small cell sizes. To what extent is this a limitation here?

Reviewer #3: Title: Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

Manuscript ID: PONE-D-23-03161

In this paper, the authors use secondary data to define a sample of middle-aged and older adults with at least 2 chronic conditions and characterize common condition clusters across age, sex, and socioeconomic deprivation strata. This is an interesting study but there a few issues that I would like to see the authors better address:

- I think it would helpful for the authors to define complex multimorbidity in the Introduction and why it is important to distinguish from multimorbidity more generally.

- Given that we know that certain conditions are going to be more common among older people (or younger people), I think it would be useful to have a stronger argument for this work. Much of the research that I’ve seen that has attempted to understand disease clustering does focus on specific age groups because of this issue. I would like to see the authors be clearer on how this study contributes to the existing research on multimorbidity clustering.

- I found the description of the sample and look back a little bit confusing. I think a figure would help to make this clearer. Also, how was death data used? My understanding is that the sample was everyone alive on Dec. 31st 2018 with a look back to Jan. 1st 2000.

- The diagnoses considered in the analysis are quite mixed. For example, some have argued that diagnoses such as hypertension and obesity are more risk factors rather than chronic conditions in-and-of themselves. I think this kind of issue warrants some discussion.

- Related to the above, common conditions (like arthritis) are what frequently make up the common clusters, often across the different strata considered in this study. This has been reported elsewhere, as well. I think it is worth discussing what this means for thinking of how to design and target services.

- I think that the reliance on hospital and secondary care codes is a more important limitation than described by the authors. I do think that they should address how their estimates compare to those derived from more comprehensive data sources as well as the potential for access to services (even in health systems with universal coverage) to impact on observations across the strata.

- I found the Recommendations to be somewhat of a reach. While I agree that understanding these types of patterns can be used to help identify where services are needed, I also think this should be discussed within the context of currently available and successful prevention and care strategies. I also think that research like this is more directly relevant to informing how we study and understand multimorbidity (rather than specific practice or policy recommendations).

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Dr Amy Ronaldson

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: MM in Scotland_Clusters.docx

PLoS One. 2023 Nov 29;18(11):e0294666. doi: 10.1371/journal.pone.0294666.r002

Author response to Decision Letter 0


13 Jun 2023

16th May 2023

Dear

PONE-D-23-03161

Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

Thank you for your email dated 5th April 2023 asking us for a revision of this paper.

We are grateful to the reviewers for their thoughtful and constructive feedback, which we have very carefully considered. We provide below a point-by-point response to the points made. We have also included a response to the editorial points.

We have uploaded a version of the revised manuscript with tracked changes and a clean unmarked copy.

Responses to peer-review feedback

Editors Comments

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Thank you. We have checked the manuscript against the templates and made the necessary changes.

2. We noticed you have some minor occurrence of overlapping text with the following previous publication(s), which needs to be addressed:

- https://www.researchgate.net/publication/325289591_Patterns_of_Multimorbidity_in_Middle-Aged_and_Older_Adults_An_Analysis_of_the_UK_Biobank_Data

In your revision ensure you cite all your sources (including your own works), and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed.

Thank you for highlighting this may simply be due to a common use of familiar language. We have read both papers alongside each other and could not see any major overlap so with the changes we have included hope this has been resolved. If there are specific areas, you would still wish us to rewrite we would be happy to address these.

3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

We have updated the Data Sharing statement at the end of the manuscript to reflect the data are not publicly available but could be accessed via application and granting of subsequent approvals.

Page 17 Paragraph 3

“The data used in this study are sensitive and are not publicly available. Access to the data is by application to the Health Informatics Centre, University of Dundee, Scotland (hicsupport@dundee.ac.uk) using their standard governance and access processes.”

4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

Thank you. We have changed the captions and updated citations as requested.

Reviewer #1: Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

R1.1- This paper used 20 years of linked electronic health records to examine clusters of multimorbidity in Scotland across age groups, sex, and levels of deprivation. Looking at how conditions cluster together across different strata in society will potentially have implications for health service delivery. This paper contains important work but presents many complex findings which can make it difficult to interpret at times. Therefore, this paper requires restructuring and some considerable changes before it can be considered for publication.

Thank you. We have incorporated all the reviewers’ comments and restructured a number of sections

Abstract

R1.2- You say multimorbidity is present in 21.3% and then later on you say ‘multimorbidity was present in one of every 5 people’. This is repetition. Consider removing one sentence.

Thank you. We have removed the latter statement at line 64.

R1.3- Please reconsider how you describe the multimorbidity clusters in the Abstract as it is confusing in its current form. I would recommend naming the five distinct clusters and then mention that they differ by age, sex, and deprivation without going into too much detail about this in the Abstract.

We have named the five clusters seen in the overall population as requested and shortened and amended the abstract as below.

“We identified five distinct clusters of conditions in the population with multimorbidity: alcohol misuse, cancer, obesity, renal failure and heart failure. Clusters of long-term conditions differed by age, sex and socioeconomic deprivation, with some clusters not present for specific strata and others including additional conditions.”

R1.4- There are more incidences of repetition in the Abstract – please address this.

We have reviewed the abstract and removed repetition where identified.

Background

The Background is well-structured and you build your argument well.

We thank the reviewer for their positive comments

R1.5- Please provide a definition of ‘complex multimorbidity’ in order to justify the aims of your study.

We have changed the text to define this and added a reference to previous work where this definition has been used (line 94).

“and complex multimorbidity (defined as having 4 or more multiple long-term conditions)(Agrawal et al., 2022)333(Agrawal et al., 2022)333”

R1.6- Further justification is required for using data from Scotland. For example, the strong socioeconomic gradient in Scotland might be one reason for focusing on this country (example in https://www.nature.com/articles/s41598-022-20546-4#ref-CR18)

We have now explained why dataset was generated and how the need for this work was decided (Lines 113-116).

“The data was generated for a study exploring multimorbidity across different nations in the UK. Exploration of the data showed a strong socioeconomic and age gradient in terms of individual conditions and multimorbidity which we felt warranted further exploration.(Cezard et al., 2022)202020(Cezard et al., 2022)202020”. Scotland is one of the UK nations and was purposively chosen for this study because of its strong socioeconomic gradient.

Methods

R1.7- Please clarify – if participants comprised all those who were alive on 31st December 2018, how did you use death records to measure multimorbidity?

We did have access to the Deaths data but as stated by the reviewer this was not used to identify multimorbidity for patients alive at 31st December 2018. We have changed text as below (Lines 117-119, and Lines 131-132).

“Within this study, we focussed on individuals who were alive on 31st December 2018 with dates of death ascertained from linked National Record Scotland death certificates and the population register.”

“All hospital admissions, psychiatric hospital admissions, outpatients, cancer registry and emergency department records over the period were examined, and all the International Classification of Diseases (ICD)-10 codes were extracted.”

R1.8- It would be helpful to provide justification for the use of the Elixhauser Comorbidity Index to measure multimorbidity when there are more comprehensive measures available now, e.g. https://bmjmedicine.bmj.com/content/bmjmed/1/1/e000247.full.pdf

The Ho index or the code list suggested by the reviewer was not available at the time of this work for its use against EHR. We chose the ELixhauser as it has been shown to be a good marker of mortality and is well established for use with EHR. We have added text to reflect this as requested (Lines 137-139)

“The Elixhauser Index was chosen as previous reviews have suggested it is a good marker for mortality, although this was not an outcome of interest in this study, and it is well established for use with electronic health record and used in earlier studies [27]“

R1.9- In the Outcome section, please provide some examples of physical and mental long-term conditions that were included in the multimorbidity definition.

We have included a comprehensive list in the Supplementary S1 Table and added to the text (Lines 136-137).

“Depression and psychoses are examples of mental health long-term conditions included within the Elixhauser index whilst weight loss and cancer are some of the physical conditions listed (see Table S1 for full list of conditions and related codes).”

R1.10- Line 119 – frequencies and proportions of what? Numbers of conditions? This needs to be more clear.

We have changed this to specify this relates to numbers of individuals with the condition (Line 150).

“Frequencies and proportions of individuals with the conditions within each stratum”

R1.11- The role of complex multimorbidity in this study is not clear from the Methods- particularly the Data Management and Statistical Analysis section. Please provide more clarity.

We reported on complex multimorbidity in Table 1 as we felt that it was important to show the complexity that certain people with multimorbidity have and therefore the need to look in more depth at clusters of conditions. The majority of cluster identified for the whole population but especially within strata had more than 2 conditions present. We believe it is an important descriptive statistic to report but it did not need further exploration. We have added text to show this as below (Lines 151-152).

“We reported the prevalence of multimorbidity (2+) and complex multimorbidity (4+) to show how this changed across strata of the population.”

R1.12- Could you include a sentence or two in the Data Management and Statistical Analysis section about Market Basket Analysis (MBA) i.e. how this clustering method allows for clustering across the characteristics of the study sample? I believe hierarchical cluster analysis can do the same thing so it would be good to further justify the use of MBA.

We have added to the text to justify the selection of Market Basket Analysis as our clustering algorithm.

“We used Market Basket Analysis (MBA) because as it has been reported as more efficient for binary (present/absent) outcomes than the hierarchical cluster analysis that was originally built for quantitative outcomes (Hahsler et al., 2005; Hahsler & Karpienko, 2017)(Hahsler et al., 2005; Hahsler & Karpienko, 2017). It also allows an individual to "belong" to more than one cluster if they have a large number of different conditions.”

Results

R1.13- In the Methods you state that the study was based on ‘cross-sectional analysis…when an individual was aged at least 25 years’ (line 90/91). However, the sample in the Results comprise people aged over 44 years. Some clarity is required here. Why over 44 years? Is this because there is a retrospective study period of 20 years?

We apologise for the confusion. We have amended the first sentence of the Methods to better describe the population for the study (Lines110-112).

“The population for this study were residents of Fife and Tayside, Scotland who were aged at least 25 years old on 1st January 2000 and who were followed up until 31 December 2018, when a cross-sectional analysis of all live patients was performed.”

R1.14- Were those with complex multimorbidity included in the overall multimorbidity count? i.e. Did 67,728 people have MM or did 87,851 people have MM? Please provide clarity in the paper.

Those with complex multimorbidity were also included in the multimorbidity count and we have therefore amended the text to better reflect this (Lines 182-184).

“Overall, 318,235 people aged 44 years and over were included in the analysis, with 67,728(21·3%) having multimorbidity, while 20,123(6·3%) were also classed as having complex multimorbidity.”

R1.15- Table 1 – please provide proportions (%) for ‘All individuals’, e.g., for sex, age groups, etc. to allow comparison with the MM group. Please also provide a mean age for ‘All individuals’.

We have provided this in Table 1 and added a mean age for the overall population, those with multimorbidity and those with complex multimorbidity.

R1.16- Figure 1 – I would recommend removing the percentages attached to the lines in the graph

We have removed the percentages in the Figure as requested.

R1.17- In the Introduction, you talk about links between MM and obesity and harmful use of alcohol and drugs (lines 58-60). However, you include both obesity and alcohol abuse as long-term conditions in your multimorbidity measure. I would consider removing this from the Introduction as it introduces confusion.

We have removed this as suggested (Line 79).

R1.18- It might be useful to assign more meaningful names to the clusters, seeing as they don’t differ massively between sex or deprivation levels. For example, cluster 5 could be called the cardiovascular cluster.

Thank you, we have assigned names to the clusters. We have Alcohol misuse Cluster, Cancer Cluster, Obesity Cluster, Renal Failure Cluster and Heart Failure Cluster

R1.19- Lines 181 to 190 are confusing for the reader. Please consider presenting these results in a different way.

We hare rewritten this section in L306-315

A number of previous studies have identified clusters that include cardiovascular-metabolic conditions, mental health issues, and musculoskeletal disorders (Prados-Torres et al., 2014). A recent study has also demonstrated variations in clusters by age, as well as differences in mortality and service utilisation(Zhu et al., 2020). In addition, our study has found how long-term conditions cluster differently based on age, sex, and socioeconomic deprivation:

R1.20- Line 191-192: Please provide some examples of the core set of conditions across strata.

We have provided the core sets (Lines 246-247).

“The core conditions identified across strata included alcohol misuse, other neurological disorders and depression in the alcohol ,misuse cluster and solid tumour without metastasis and metastatic cancer in the cancer cluster.”

R1.21- Table 2 is really helpful for the interpretation of Results. However, I would suggest reorganising the ‘Clustering’ section of the Results for more clarity. More subheadings would be useful. I would consider basing these on individual clusters.

We have Introduced subheadings as requested in Table 2.

R1.22- Why was there no cluster analysis performed amongst those with complex multimorbidity?

We have ammended the text as explained above to establish that complex multimorbidity and how it differs across strata was part of the rationale for exploring clusters. We do not feel that clustering conditions purely for those individuasl with complex multimorbidity would add to this paper.

R1.23- Presenting clusters across age groups is difficult and I think Table 3 requires some further clarification. Perhaps include a footnote to indicate that efforts were made to align clusters that were similar. I would also explain why blank cells exist for certain age groups. Moreover, would there be a way of showing how common it is to for participants with MM to belong to one of these clusters in the Table?

Thank you. We have provided the footnotes. Yes, we presented the how common it is to for participants with MM to belong to one of these clusters in S4 and S5 Table

Discussion

R1.24- In the second paragraph, you state that 99% of those aged >80 years were included in cluster 1- please correct to Cluster 2. And correct ‘clusters 2 and 3’ to ‘4 and 6’ (judging by Table 3).

Thank you. This was based on Supplementary S5 Table. We have now harmonized the numbering in S5 Table with Table 3. We have made this clearer in the text.

R1.25- Parts of the Discussion feel like a Results section (but for the Supplementary results). Make sure to discuss results here, rather than just describe them.

Thank you. We have revised the entire discussion section

R1.26- Please add some strengths of the study

We have added strengths of the study (Page 18).

“The subjects for the study were drawn from the population within two Scottish regions with comprehensive health records over a long period of time. The use of well-defined conditions and ICD-10 codesets to identify each condition allows other researchers to explore multimorbidity using the same methods. Using market basket analysis to cluster conditions rather than classifying patients into mutually exclusive groups likely represents real-world scenarios more accurately. Patients with multiple long-term conditions can be classified in different ways depending on the specific factors that are taken into consideration. Using the same approach across different strata meant comparisons were down to the underlying data rather than simply different populations using different methods.”

R1.27- Are there any limitations relating to market basket analysis that need to be mentioned? Is this widely applied to healthcare data?

We have added a possible limitation of measure of deprivation. The method has been applied to several healthcare data. However, the clusters generated by market basket analysis are based on patterns in the data, and may not guarantee association, so must be treated with caution. Page 18

“The measure of deprivation used in this study is allocated at a postcode level but it a small area approximation rather than a direct measure of individual deprivation.”

R1.28- Are there any limitations around the measure of deprivation used?

And similarly for Market basket analysis.

“”Similarly there have been a number of different methods used to identify clusters within multimorbid populations but our choice of Market Basket Analysis as a methodological tool is unlikely to be the cause of differences when examining strata.” Page 18

General remarks:

R1.29- Please check spelling and grammar throughout

Thank you, we have carried out language edit

Reviewer #2: Thank you for the opportunity to review this paper. An understanding of how LTCs cluster together for people in different social groups (by age, deprivation etc) is very much needed. I was looking for such a paper recently to understand socioeconomic inequalities in outcomes for people with MLTCs so I am pleased to see this work has been done. The paper is clear, well-written and nicely summarises a complex picture. My comments are minor.

Thank you for your kind comments.

Methods:

R2.1- Please make it clearer when describing the method and also in the tables that a person can "belong" to more than one cluster.

Thank you, we have made this clearer in Lines 168-171 as reported to Reviewer 1.

R2.2- What is the rationale for using Elixhauser conditions?

Thank you, we have added a rationale for this in Lines 137-139 as reported to Reviewer 1.

Results:

R2.3- S4 Table. I think these important results should be included in the main text (or at least the whole population rows at the top of the table if there isn't room for all of this).

Thank you. We have created Table 4 to reflect this.

R2.4- Would it be appropriate to calculate the prevalence of each cluster using the whole sample rather than the number with MLTCs as the denominator? That would be interesting to include in the main text.

In S4 Table, we presented the % of MM population clustered within each sub-population group. We do not think it is appropriate to compute prevalence for the general population that didn’t have multimorbidity

R2.5- S4 Table Least deprived population Solid tumour w/c metastasis cluster appears twice (presumably a labelling error).

Thank you for highlighting this error. We have corrected this in the Table

R2.6- S2 Fig. What does the y-axis Height refer to? Please add a footnote to the figure to explain this. Also the clusters are shown as branches and sub-branches. It would be helpful to have a brief explanation in a footnote on what this means for those of us who aren't familiar with the clustering method used.

Thank you. We have added footnotes to explain this.

Discussion:

R2.7- As the authors point out, the specifics of the clusters identified will depend on which LTCs are counted and whether the data is based on hospital admissions or also primary care. However, it would be helpful to know to what extent the authors think the clusters they have identified are likely to be replicated in broad terms (eg do they make clinical sense, are the results from this local study likely to be generalisable in other places). Or is this more a proof of concept paper showing that we need to segment and not assume that one size fits all?

I think the paper currently makes the latter point well. But if there are some clusters that are likely generalisable then it would be good to expand the policy/practice implications (e.g. for the alcohol, depression, drug use cluster) .

We have added the text below in response to the reviewer’s comment. L415-431.

The identified clusters make clinical sense and are likely to be generalisable to similar populations, but many of the implications they raise for clinical services are known. Related or consequential conditions have separate codes and hence induce clustering which is somewhat artificial and commonly represents different manifestations of disease pathways or disease progression. The benefit of this work is identifying that stratifying by age, sex and socioeconomic status is needed to identify the most relevant clusters for those groups.

For instance, alcohol cluster included alcohol abuse, depression and other neurological disorders. This is not surprising as alcoholism may lead to depression or coping with depression through excess drinking. Alcohol is a known neurotoxin and hence a relationship with other neurological conditions is to be expected. Alcohol raises blood pressure and hence stroke risk. The cancer cluster includes tumour without metastasis and metastatic cancer, this is expected as cancers progress. For the obesity cluster, obesity is a risk factor for hypertension and diabetes which will be a common group in this cluster. Hypothyroidism is more common in rheumatoid arthritis. In the renal failure cluster, atherosclerosis is a common cause of renal failure and peripheral vascular disorders. Iron deficiency anaemia is very common in renal failure and a consequence of erythropoietin deficiency. Also, fluid & electrolyte disorders are a consequence of renal failure. For the heart failure cluster, atherosclerosis or valvular disorders can cause heart failure and arrythmias.

R2.8- Quite a lot of the material on page 11-12 looks like results rather than discussion. I'd suggest moving some of that up into the results section.

Thank you. We have changed the discussion to reflect this point which was also raised by reviewer 1. See above for details. Page 11-12

R2.9- It's not helpful to refer to cluster numbers without a description of the conditions in the discussion.

We have named the clusters so that referring to them is easier and to try and describe something meaningful about each cluster. Page 9-10, Tables 2 and 3

R2.10- Some of the combinations have small cell sizes. To what extent is this a limitation here?

The small cell counts do not constitute a limitation as we didn’t encounter any issue in our algorithms while implementing the clustering.

Reviewer #3: Title: Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

Manuscript ID: PONE-D-23-03161

In this paper, the authors use secondary data to define a sample of middle-aged and older adults with at least 2 chronic conditions and characterize common condition clusters across age, sex, and socioeconomic deprivation strata. This is an interesting study but there a few issues that I would like to see the authors better address:

Thank you

R3.1- I think it would helpful for the authors to define complex multimorbidity in the Introduction and why it is important to distinguish from multimorbidity more generally.

We have addressed this. L102, page 3

R3.2- Given that we know that certain conditions are going to be more common among older people (or younger people), I think it would be useful to have a stronger argument for this work. Much of the research that I’ve seen that has attempted to understand disease clustering does focus on specific age groups because of this issue. I would like to see the authors be clearer on how this study contributes to the existing research on multimorbidity clustering.

Besides the clustering of the conditions among all the MM participants, we conducted age-specific, sex-specific and deprivation-specific clustering. Kindly see Tables 2 and 3 and the supplementary materials. This is the strength of this study, and it is addition to the body of knowledge.

R3.3- overall clu

- I found the description of the sample and look back a little bit confusing. I think a figure would help to make this clearer. Also, how was death data used? My understanding is that the sample was everyone alive on Dec. 31st 2018 with a look back to Jan. 1st 2000.

We have changed the text to better explain how death data was used in identifying the cohort and also to explain how the cohort was formed and the follow-up period identified.- See response to Reviewer 1. Page 4

R3.4- The diagnoses considered in the analysis are quite mixed. For example, some have argued that diagnoses such as hypertension and obesity are more risk factors rather than chronic conditions in-and-of themselves. I think this kind of issue warrants some discussion.

We agree that hypertension and obesity can be either depending on the context but the conditions we investigated were all conditions identified within the Elixhauser conditions and we discuss the limitations of this approach in the Discussion (Page 16). In the current study, we only assessed clustering of conditions, we didn’t assess risk factors of any outcome.

R3.5- Related to the above, common conditions (like arthritis) are what frequently make up the common clusters, often across the different strata considered in this study. This has been reported elsewhere, as well. .

We do agree, we have provided possible linkages among the conditions in the clusters L415-432.

R3.6- I think that the reliance on hospital and secondary care codes is a more important limitation than described by the authors. I do think that they should address how their estimates compare to those derived from more comprehensive data sources as well as the potential for access to services (even in health systems with universal coverage) to impact on observations across the strata.

We agree. We have rewritten the limitation section to reflect this Page 18

R3.7- I found the Recommendations to be somewhat of a reach. While I agree that understanding these types of patterns can be used to help identify where services are needed, I also think this should be discussed within the context of currently available and successful prevention and care strategies. I also think that research like this is more directly relevant to informing how we study and understand multimorbidity (rather than specific practice or policy recommendations).

Thank you. We have re-focussed and contextualized the recommendations. P477-484

Attachment

Submitted filename: Responses to Reviewers.docx

Decision Letter 1

Sreeram V Ramagopalan

24 Oct 2023

PONE-D-23-03161R1Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in ScotlandPLOS ONE

Dear Dr. Fagbamigbe,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Dec 08 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Sreeram V. Ramagopalan

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

Reviewer #3: Title: Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

Manuscript ID: PONE-D-23-03161R

The authors have clearly put in quite a bit of effort to address the Reviewer’s concerns. I appreciate that they have clarified their methods and presentation of their results. There are still a few issues that I would like to see addressed:

1. It is not clear to me why Cluster 1 is characterized as Alcohol Abuse. I would like to see some rationale for this decision. For the other clusters, the reasoning for the label is more clear or seems to be more representative of the main conditions but less so for Cluster 1. I think because a label like alcohol abuse could be considered stigmatizing, and the way it is presented with economic deprivation, makes me somewhat uncomfortable. I also wonder if there is something to be discussed in terms of how codes such as alcohol abuse are assigned and whether that is at all related to the observed patterns with age and economic deprivation.

2. I think that the Discussion would benefit from some more thorough editing. There are some places where the sentences are not complete – I think this is likely from the multiple rounds of track changes – but it does make it difficult to follow even the clean version. It is also repetitive in places, particularly the points about clustering showing variability by age and economic deprivation.

3. Related to the above, I think that the authors can provide a deeper discussion of what the clusters might say about why they may be seeing these conditions together – either because they are physiologically related, have common determinants, or some other reason. I do not think as currently presented there is much offered beyond describing the common conditions in clusters.

4. I do not understand the statement that Market Basket Analysis is based on patterns in the data but may not guarantee associations (page 15, line 372).

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Nov 29;18(11):e0294666. doi: 10.1371/journal.pone.0294666.r004

Author response to Decision Letter 1


30 Oct 2023

Comments to the Author

Reviewer #2: (No Response)

Reviewer #3: Title: Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

Manuscript ID: PONE-D-23-03161R

The authors have clearly put in quite a bit of effort to address the Reviewer’s concerns. I appreciate that they have clarified their methods and presentation of their results. There are still a few issues that I would like to see addressed:

1. It is not clear to me why Cluster 1 is characterized as Alcohol Abuse. I would like to see some rationale for this decision. For the other clusters, the reasoning for the label is more clear or seems to be more representative of the main conditions but less so for Cluster 1. I think because a label like alcohol abuse could be considered stigmatizing, and the way it is presented with economic deprivation, makes me somewhat uncomfortable. I also wonder if there is something to be discussed in terms of how codes such as alcohol abuse are assigned and whether that is at all related to the observed patterns with age and economic deprivation.

The naming of the clusters was discussed with the research team with either the most common condition or a representative term used but it is still subjective labelling. Cluster 1 for instance was named as alcohol misuse as this was the most common condition for people identified in the cluster but it could also have been labelled as socioeconomic-driven conditions

2. I think that the Discussion would benefit from some more thorough editing. There are some places where the sentences are not complete – I think this is likely from the multiple rounds of track changes – but it does make it difficult to follow even the clean version. It is also repetitive in places, particularly the points about clustering showing variability by age and economic deprivation.

We have edited the discussion and removed duplications

3. Related to the above, I think that the authors can provide a deeper discussion of what the clusters might say about why they may be seeing these conditions together – either because they are physiologically related, have common determinants, or some other reason. I do not think as currently presented there is much offered beyond describing the common conditions in clusters.

We have edited the section as well. Our analysis shows association not causality, but it may be possible to surmise the drivers of specific clusters as identified. Conditions most prevalent in our most deprived population groups include alcohol and drug misuse, depression and obesity which are all know to be associated with social factors. Other identified clusters are likely to have more physiological drivers e.g. hypertension through to heart diseases.

4. I do not understand the statement that Market Basket Analysis is based on patterns in the data but may not guarantee associations (page 15, line 372).

Thank you, We have changed this line to now read “However, the clusters generated by market basket analysis are based on empirical patterns in the data and only show associations between different conditions, it does not show any causal relationships between those data.”

Attachment

Submitted filename: Comments to the Author R2.docx

Decision Letter 2

Sreeram V Ramagopalan

7 Nov 2023

Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

PONE-D-23-03161R2

Dear Dr. Fagbamigbe,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Sreeram V. Ramagopalan

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Sreeram V Ramagopalan

17 Nov 2023

PONE-D-23-03161R2

Clustering long-term health conditions among 67728 people with multimorbidity using electronic health records in Scotland

Dear Dr. Fagbamigbe:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Sreeram V. Ramagopalan

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Rate of co-occurrence of pairs of conditions per 1000 people with multimorbidity.

    (TIF)

    S2 Fig. Clustering of Conditions among multimorbid (2+ conditions) patients by sex and level of deprivation (excludes conditions with less than 5% prevalence).

    (TIF)

    S3 Fig. Clustering of Conditions among multimorbid (2+ conditions) patients by age groups (only include conditions with 5%+ prevalence).

    (TIF)

    S4 Fig. Clustering of Conditions among multimorbid (2+ conditions) patients by age and deprivation (excludes conditions with less than 5% prevalence).

    (TIF)

    S1 Table. List of Elixhauser Index conditions, abbreviations, ICD10 codes.

    (PDF)

    S2 Table. Prevalence of the conditions among all patient, patients with multimorbidity and complex multimorbidity.

    (PDF)

    S3 Table. Prevalence of the conditions by characteristics of people with multimorbidity.

    (PDF)

    S4 Table. Multimorbidity Clusters of the Conditions among the whole multimorbid population, sex and deprivation subgroups.

    (PDF)

    S5 Table. Multimorbidity Clusters of the Conditions across age subgroups.

    (PDF)

    S6 Table. Multimorbidity Clusters of the Conditions across age-deprivation subgroups.

    (PDF)

    Attachment

    Submitted filename: MM in Scotland_Clusters.docx

    Attachment

    Submitted filename: Responses to Reviewers.docx

    Attachment

    Submitted filename: Comments to the Author R2.docx

    Data Availability Statement

    A data dictionary covering the data sources used in this study and the analysis codes are deposited at https://github.com/fadeniyi123/MM_HDRUK. The data used in this study are sensitive and are not publicly available. Access to the data is by application to the Health Informatics Centre, University of Dundee, Scotland (hicsupport@dundee.ac.uk) using their standard governance and access processes.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES