Using hypergraphs to quantify importance of sets of diseases by healthcare resource utilisation: A retrospective cohort study

James Rafferty; Alexandra Lee; Ronan A Lyons; Ashley Akbari; Niels Peek; Farideh Jalali-najafabadi; Thamer Ba Dhafari; Jane Lyons; Alan Watkins; Rowena Bailey

doi:10.1371/journal.pone.0295300

. 2023 Dec 15;18(12):e0295300. doi: 10.1371/journal.pone.0295300

Using hypergraphs to quantify importance of sets of diseases by healthcare resource utilisation: A retrospective cohort study

James Rafferty ^1,^*, Alexandra Lee ¹, Ronan A Lyons ¹, Ashley Akbari ¹, Niels Peek ^2,³, Farideh Jalali-najafabadi ⁴, Thamer Ba Dhafari ², Jane Lyons ¹, Alan Watkins ¹, Rowena Bailey ¹

Editor: Ratna Dwi Wulandari⁵

PMCID: PMC10723667 PMID: 38100428

Abstract

Rates of Multimorbidity (also called Multiple Long Term Conditions, MLTC) are increasing in many developed nations. People with multimorbidity experience poorer outcomes and require more healthcare intervention. Grouping of conditions by health service utilisation is poorly researched. The study population consisted of a cohort of people living in Wales, UK aged 20 years or older in 2000 who were followed up until the end of 2017. Multimorbidity clusters by prevalence and healthcare resource use (HRU) were modelled using hypergraphs, mathematical objects relating diseases via links which can connect any number of diseases, thus capturing information about sets of diseases of any size. The cohort included 2,178,938 people. The most prevalent diseases were hypertension (13.3%), diabetes (6.9%), depression (6.7%) and chronic obstructive pulmonary disease (5.9%). The most important sets of diseases when considering prevalence generally contained a small number of diseases, while the most important sets of diseases when considering HRU were sets containing many diseases. The most important set of diseases taking prevalence and HRU into account was diabetes & hypertension and this combined measure of importance featured hypertension most often in the most important sets of diseases. We have used a single approach to find the most important sets of diseases based on co-occurrence and HRU measures, demonstrating the flexibility of the hypergraph approach. Hypertension, the most important single disease, is silent, underdiagnosed and increases the risk of life threatening co-morbidities. Co-occurrence of endocrine and cardiovascular diseases was common in the most important sets. Combining measures of prevalence with HRU provides insights which would be helpful for those planning and delivering services.

Introduction

Multi-morbidity, or multiple long term conditions (MLTC) is defined as the presence of two or more long-term conditions [1]. The prevalence of multi-morbidity is increasing across the world due to ageing populations and improved survival for many chronic conditions [2–4]. multi-morbidity is also more common in less affluent or educated communities [5, 6]. Historically, health research has generally focused on single diseases, so comparatively little is known about how multiple diseases and treatments interact. Little is known about which combinations of conditions are most prevalent or troublesome. Our aim was to quantitatively evaluate combinations of long term conditions to determine their importance when considering prevalence and also their impact on healthcare resource utilisation, for which we used the standardised rate at which people interacted with outpatient services or as unplanned inpatient services admission. Understanding such resource utilisation would be valuable for better planning of healthcare and improving patient outcomes.

Several studies investigating how multi-morbidity affects patient outcomes using linear or time-to-event regression, such as Cox regression, exist in the literature. For example, [7] developed a prediction model to estimate the risk of additional chronic diseases using a copula-based approach. [8, 9] used multiple logistic regression to model the relationship between multi-morbidity and some outcome measure. [10, 11] used Cox regression methods to analyse the interplay between multi-morbidity and long term mortality. Several groups have recently used machine learning models, typically random forests, to investigate how multi-morbidity relates to certain outcomes, for example, random forests were used by [12] to develop a multi-morbidity frailty index and by [13] to investigate the relationship between multi-morbidity and healthcare expenditure. Deep learning models have recently been applied to the problem of healthcare resource utilisation. For example [14] used an attention-based model to predict operations from diagnosis data in secondary care, while [15] investigated predicting healthcare expenditure from multiple sources of input data.

The use of statistical modelling and unsupervised machine learning techniques to find clusters of coincident diseases and understand multi-morbidity has been thoroughly explored in the literature recently [16]. Several groups observed that network analysis could be used to describe a system of diseases and the interactions between pairs of disease (see for example [17–23]).

Network based approaches utilising mathematical structures called graphs and hypergraphs have several advantages over other clustering and statistical modelling approaches. Many clustering methods (for example, hierarchical clustering) allow the user to only include each disease in a single cluster, which may obscure interactions where a single disease is an important feature of several disease clusters. Approaches utilising weighted graphs are useful since they allow us to account for the prevalence of individual conditions while also accounting for the number of people that have different combinations of diseases (which can be thought of as a measure of ‘prevalence’ of sets of diseases).

We used a hypergraph rather than a simpler binary graph for this work because hypergraphs can quantify the effect of interactions between any number of diseases (as opposed to just two in a binary graph). The calculation of a quantity from the graph called centrality allows one to quantitatively estimate the connectedness of nodes within the graph. Nodes with a high centrality are strongly connected to other nodes, which for a graph where nodes represent diseases and the edge weights represent the number of people that have all diseases connected to the edge, represents the frequency at which a single disease features in sets of diseases (and hence, it’s importance to multi-morbidity). We used a network approach and a hypergraph in a previous original article and tutorial paper [24]. Subsequently, work using hypergraphs has been performed by others [25].

The measure of prevalence of sets of diseases is only one choice of weighting scheme. Any measure related to the sets of diseases can be used to weight the hypergraph, which reflects the flexibility of the approach. In this study we have chosen to use healthcare resource utilisation as the quantity with which we will weight the graph, since healthcare resource use is an important factor to understand for healthcare delivery planning purposes, and is likely to be highly correlated with negative patient outcomes.

This aims of this work were

To demonstrate the utility of hypergraphs in their application to problems of quantifying disease set importance based on healthcare resource use, a metric unrelated to the prevalence of the diseases.
To find the most important sets of diseases based on two different measures of healthcare resource use, interactions with outpatient services and unplanned interactions with inpatient services, and a similar measure of prevalence to that used in previous work [24] making a total of three weightings for consideration.

Methods

This study was performed using anonymised routine data held in the Secure Anonymised Information Linkage (SAIL) Databank, a Trusted Research Environment for people interacting with healthcare services in Wales, UK. The study was approved by the Information Governance Review Panel under project reference number 0911. Ethical approval was not required nor sought for the study.

Cohort

The cohort used for this study consisted of all people living in Wales, UK on the 1st January 2000 and aged 20 years or older, which was constructed specifically to study multimorbidity in Wales, UK. Please see [26] for a full description of the cohort used. All clinical events recorded in primary or secondary care before the index date of 1^st January 2015 were included in the study. The raw data consisting of Read coded primary care data and ICD-10 coded secondary care data were processed into a table containing one row per pseudonymised person ID. The outcome measures chosen were the standardised rate of unplanned admissions to inpatient care and the standardised rate of interactions with outpatient services recorded in the three years following the index date (i.e. from 1^st January 2015 to 31^st December 2017). All data for the study were accessed and analysed within the SAIL Databank [27, 28].

Hypergraph data

Hypergraphs were constructed using software available from [29]. Three hypergraphs were built separately using different weighting schemes to quantify diseases prevalence and healthcare resource utilisation. In each hypergraph diseases were represented by nodes and sets of diseases by edges [24]. The disease data were derived from primary and secondary care records held in the Welsh Longitudinal General Practice Database and the Patient Episodes Database for Wales respectively, and consisted of binary flags indicating the presence or absence of specific disease diagnoses in patient records. The disease definitions used were taken from the Elixhauser morbidity index [30, 31]. Three pairs of diseases were merged (i. cancer and metastatic cancer, ii. diabetes and diabetes with complication, iii. hypertension and hypertension with complication) since they are closely related by nature and would induce pseudoclustering.

Firstly for the prevalence hypergraph node weights, we used the prevalence of the diseases

w_{i}^{N} = \frac{| X_{i} |}{P}

where |X_i| is the number of people with disease X_i and P is the total population. For the edge weights we chose the generalised overlap coefficient

w_{a}^{E} = \frac{| X_{i} \cap X_{j} \cap X_{k} \cap \dots \cap X_{l} |}{min (| X_{i} |, | X_{j} |, | X_{k} |, \dots, | X_{l} |)}

where E_a = {X_i, X_j, X_k, …, X_l}.

For the outpatient resource utilisation hypergraph the node and edge weights were the age standardised rate of interactions with outpatient services per 100,000 people recorded for people with the specific disease or set of diseases (i.e., the edge weight for the diabetes and rheumatoid arthritis edge used only people with recorded diagnoses of diabetes, rheumatoid arthritis and no other diseases under consideration in the study but we note that people may have recorded diagnoses for diseases that were not considered as part of the study). For the unplanned inpatient resource utilisation hypergraph the weighting scheme was the age standardised number of unplanned admissions to inpatient care recorded for people with the specific disease or set of diseases. Age standardisation was performed using the European standard population 2013 [32].

We then computed the eigenvector centrality of the dual representation of each hypergraph. The eigenvector centrality of the nodes and edges of a hypergraph give a direct measure of the importance of each to the graph as a whole and as such are interpreted as the importances of the diseases and sets of diseases. Uncertainties were calculated using bootstrapping, i.e. a bootstrap cohort was selected from main cohort with replacement and the hypergraph and eigenvector centrality were calculated, and this process was repeated. The mean of the eigenvector centrality for each set of diseases was taken as the estimate of centrality, while the 2.5% and 97.5% percentiles of the bootstrap distribution were taken as the 95% confidence intervals. We discarded sets from the results where the lower 95% confidence interval of the centrality intersected with zero.

Further analysis

In order to construct a picture of sets of diseases that are important both because of their prevalence and because of their healthcare resource utilisation we constructed a composite measure of these quantities. Firstly, we found the sets of diseases that had a centrality higher than the median for both hypergraphs. We then found the Euclidean sum of the eigenvector centralities (i.e. the square root of the sum of the squares of the individual eigenvector centralities). We constructed three composite measures, two combining the overlap coefficient centralities with the HRU centraities to investigate differences in care needs for different sets of diseases and finally one combined measure of all three centralities.

Results

Data from a total of 2,178,938 people were included in the analysis. See Table 1 for summary statistics. The most commonly diagnosed diseases were hypertension (13.3%), diabetes (6.9%), depression (6.7%) and COPD (5.9%). The frequency of the number of interactions with healthcare services for both outpatient and unplanned inpatient services was approximately exponential other than an enhanced zero count, as can be seen in Fig 1.

Table 1. The fraction of people diagnosed with the feature diseases.

Disease	Diagnosed (%)
Hypertension	13.30
Diabetes	6.88
Depression	6.69
COPD	5.89
Any Cancer	4.96
Renal Disease	4.83
Obesity	4.56
Arrythmia	4.52
Other Neurological Disorders	3.61
Hypothyroidism	2.77
Deficiency Anaemia	2.58
Congestive heart failure	2.06
Fluid & Electrolyte Disorders	1.98
Weight Loss	1.87
Valvular Disease	1.69
Peripheral Vascular Disease	1.61
Rheumatoid Arthritis	1.49
Peptic Ulcer	0.94
Liver Disease	0.73
Pulmonary Circulation Disorder	0.67
Paralysis	0.54
Drug Abuse	0.53
Psychosis	0.47
Coagulopathy	0.34
Lymphoma	0.22
Blood loss anaemia	0.12

Open in a new tab

Fig 1 — The y-axes are logarithmic scales. Left: Outpatient services. Right: Unplanned inpatient services.

The most important sets of diseases by prevalence all featured hypertension and were generally smaller sets, with the most important being hypertension and diabetes. The most important set containing three conditions and the 20th most important set overall was hypertension, diabetes and obesity (see supplement for the 100 most important disease sets). the most important set of diseases for unplanned inpatient HRU was a set of nine diseases (Arrhythmia, COPD, heart failure, fluid and electrolyte disorder, peripheral vascular disease, pulmonary circulatory disorder, renal disease, valvular disease and hypertension). The most important set of diseases for outpatient HRU contained eight diseases (COPD, heart failure, depression, fluid and electrolyte disorder, obesity, peripheral vascular disease, renal disease and valvular disease) (see supplements for the 100 most important disease sets for both unplanned inpatient and outpatient HRU).

When combined to investigate sets of diseases that were important for both prevalence and one of the HRU hypergraphs, we found the most important set of diseases for overlap coefficient combined with unplanned inpatient HRU was arrhythmia, heart failure and hypertension while for overlap coefficient combined with outpatient HRU the most important set was diabetes and hypertension. See Fig 2 for a plot of the overlap coefficient hypergraph centrality against the HRU weighted hypergraph centrality and supplemental material for tables of the 100 most important sets. In Fig 2, each point represents a set of diseases, and the distance from the origin of each point represents the combined importance of the set of diseases. The colour of the points represents the number of diseases in the set. It is evident that larger sets of diseases typically have larger HRU centrality values, while smaller sets of diseases typically have larger overlap coefficient centrality values.

Fig 2 — Each point represents a set of diseases. The colour of the point represents the number of diseases in the set. Left: Outpatient services. Right: Unplanned inpatient services.

When all three hypergraph centralities were combined, the most important set of diseases was diabetes and hypertension. All of the top 17 sets of diseases featured hypertension. See the supplemental material for a table of the 100 most important sets.

The single diseases that were included most often in the most important sets of diseases were hypertension, appearing in 60.1% of the most important disease sets for outpatient HRU and 59.9% of the most important disease sets for unplanned inpatient HRU, followed by arrhythmia and renal disease (see Fig 3).

Fig 3 — Left: Outpatient services. Right: Unplanned inpatient services.

Discussion

This work demonstrates the use of hypergraph analysis for applied multi-morbidity research beyond simply describing clusters of coincident diseases. The weights in a hypergraph can in principle be used to quantify any relationship between the nodes of the hypergraph which makes them supremely flexible and useful mathematical objects for modelling many things, including differences in healthcare resource utilisation between people that have different sets of diseases.

The importance of the sets of diseases when the hypergraphs were considered on their own exhibited patterns one would expect (see for example [33] which had similar findings to our study). There is currently no accepted standard method for defining the diseases that are considered in studies of multimorbidity, and considerable variation in methods used in the literature has led to different sets of diseases and corresponding differences in conclusions [34].

One needs to be careful when interpreting the results of hypergraph centrality, especially when the weighting scheme is more abstract like a measure of HRU. Eigenvector centrality, as used here, is high when a node is strongly connected to other nodes that also exhibit high centrality. For hypergraphs with weightings that depend on a measure of prevalence an important disease set is one where the number of people with the set of conditions is relatively high compared to other disease sets in the hypergraph, and also that the disease set is strongly connected to other disease sets. This implies that people with a specific ‘important’ set of diseases are more likely to acquire new diseases. When the weighting scheme used to construct the hypergraph is more abstract, such as HRU for each set of diseases, the interpretation is more complicated. A high centrality means that the HRU of the set of diseases is large, but the HRU of neighbouring sets of diseases is also large. This means that if a person has an ‘important’ set of conditions, then removing or adding a disease from the set wouldn’t have a large effect on the approximate HRU.

We chose to consider importance of multi-morbidity using two axes, prevalence and HRU. The most important sets of diseases for the prevalence hypergraphs were sets containing few diseases, because the number of people that had a small number of diseases was large compared to the number that had many diseases. The most important set of diseases in the prevalence hypergraph was hypertension with diabetes, both of which are very prevalent conditions.

Conversely, the most important sets of diseases for the hypergraphs weighted by HRU consisted of many diseases, which is also natural as people with many diseases are likely to have more complex healthcare needs that will require involvement of clinicians from different specialties. Despite the centrality being large for many large sets of diseases the confidence interval around the centrality depended on the number of people in the set and often became very large when the number of diseases in the set was large. We discarded sets where the lower 95% confidence interval of the centrality intersected with zero. This had the effect of removing sets of diseases where the centrality was indistinguishable from zero, but also had the effect of removing sets with small numbers of individuals in them.

Arguably, the most important sets of multi-morbidities are those which are both prevalent and command a large HRU. To find these sets we combined most important sets of diseases from two hypergraphs, the prevalence centrality combined with a measure of HRU centrality. This created a different set of rankings. The most important sets of conditions for the prevalence and unplanned inpatient combination was arrhythmia, heart failure and hypertension while for the combination of prevalence and outpatients HRU was diabetes and hypertension. We observed that the most important sets of diseases for the outpatient activity combination were smaller than for the emergency inpatients combination. Hypertension appeared prominently in the list of most important sets for both combinations.

Combining centrality measures for all three hypergraphs we have computed provides an overall picture of the diseases that have the highest general HRU and are relatively prevalent compared to other sets of diseases. The most important sets of diseases using this measure all featured hypertension, with the most important set being diabetes and hypertension. For future work one may consider applying a weighting to the combination of centrality measures. In this study the hypergraph centralities are all weighted equally, but we note that when combining a prevalence hypergraph and a HRU hypergraph the prevalence component contributes 50% of the combined centrality, but for this combination of all three hypergraphs the prevalence component only contributes 33.3% of the combined centrality. The contribution of hypergraph weights were allocated equally since there was no identified a-priori method to allocate disproportionate weightings. This observation could be used to tailor the method to specific research questions, for example, in demographic groups where the majority of healthcare interactions are delivered via outpatient services, the hypergraph derived from outpatient HRU could be weighted more highly in the combined centrality than inpatient HRU centrality.

Hypertension was the disease that appeared most frequently in sets of diseases that were important in both the prevalence and HRU hypergraphs by a large margin. Hypertension is a “silent” condition, inasmuch as moderate or even severe hypertension often presents with no symptoms. Furthermore, it is commonly associated with an increased risk of life-threatening cardiovascular conditions like heart attack and stroke. The order of the single diseases that appear most commonly in the most important sets of diseases is largely the same for the two measures of HRU, meaning people who have higher HRU for unplanned impatient care are likely to also have higher HRU for outpatient services.

This study has presented a combined analysis of disease set prevalence and HRU using hypergraphs. We have quantitatively evaluated sets of diseases based on their prevalence and their HRU and ordered the sets of diseases based on importance. The study has the advantage of providing a quantitative estimate for the importance of every set of diseases (some methods for clustering diseases require that diseases can only appear in one set for example) and hypergraph objects are general enough to allow one to choose the weighting scheme used to capture the information needed by the research. A limitation of the hypergraph approach is they it can be very time consuming to compute, as the number of edges scales exponentially with the number of nodes. This makes bootstrapping to calculate uncertainties quite time consuming, even on a computing cluster. The flexibility to define the hypergraph weights also leads to some difficulties in the interpretation of hypergraph centrality.

Our results are coherent on the relatively small but growing literature on the impacts of multimorbidity. Soley-bori and colleagues carried out a systematic review of the impact of multimorbidity on healthcare costs and utilisation in the UK [35] in 2020, identifying 17 studies (7 on costs and 10 on HRU). Whilst the different studies used different demographic inclusion criteria, grouping of morbidity, and time frames the overall patterns were similar; multimorbidity found to be associated with increased primary care, emergency department and inpatient resources. Similar patterns have been reported from Denmark, India, Catalonia and China, again using different categories and methodologies [36–39].

The results of this study should be of interest to health planners, patients and patient advocacy groups. Combining measures of prevalence with HRU provides insights into aspects of the ‘importance’ of sets of multi-morbidities which would be helpful for those planning services. For future work it may be of interest to perform this analysis in cohorts of people with specific, common diseases to understand the common sets of comorbidities and HRU in those subcohorts. From this study, the most interesting populations to explore in studies of this type would be people with hypertension, diabetes or depression.

Supporting information

S1 File. Prevalence hypergraph most important disease sets.

The one hundred most central sets of diseases based on prevalence weighting.

(CSV)

Click here for additional data file.^{(10.3KB, csv)}

S2 File. Unplanned inpatients hypergraph most important disease sets.

The one hundred most central disease sets from a hypergraph weighted using the number of unplanned inpatient visits.

(CSV)

Click here for additional data file.^{(15.9KB, csv)}

S3 File. Outpatients hypergraph most important disease sets.

The one hundred most central disease sets from a hypergraph weighted using the number of outpatient visits.

(CSV)

Click here for additional data file.^{(15.8KB, csv)}

S4 File. Prevalence and outpatients HRU most important disease sets.

The one hundred most important diseases from the prevalence and outpatients HRU hypergraphs combined into a single importance score.

(CSV)

Click here for additional data file.^{(7.2KB, csv)}

S5 File. Prevalence and unplanned inpatients HRU most important disease sets.

The one hundred most important diseases from the prevalence and unplanned inpatients HRU hypergraphs combined into a single importance score.

(CSV)

Click here for additional data file.^{(10.1KB, csv)}

S6 File. Prevalence, unplanned inpatient and outpatients HRU most important disease sets.

The one hundred most important diseases from the unplanned inpatient, prevalence and outpatients HRU hypergraphs combined into a single importance score.

(CSV)

Click here for additional data file.^{(9.1KB, csv)}

Acknowledgments

This study makes use of anonymised data held in the SAIL Databank, which is part of the national e-health records research infrastructure for Wales. We would like to acknowledge all the data providers who make anonymised data available for research.

Data Availability

This study makes use of anonymized, individual-level data held in the SAIL Databank, a Trusted Research Environment, at Swansea University, Swansea, UK. Due to the nature and level of the data, data are not publicly available but are available to researchers upon application. All proposals to use SAIL data are subject to review by the independent Information Governance Review Panel (IGRP). The IGRP gives careful consideration to each project proposal to ensure appropriate use of SAIL data. If a project is approved, access to the requested data is gained through a privacy-protecting safe haven and remote access system referred to as the SAIL Gateway. SAIL has established an application process to be followed by anyone who would like to access data at https://saildatabank.com/sail_user_application/ and further information is available by emailing SAILDatabank@swansea.ac.uk.

Funding Statement

This work was supported by the Medical Research Council (MRC), grant no. MR/S027750/1. FJ is supported by a MRC/University of Manchester Skills Development Fellowship (grant number MR/R016615). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Willadsen TG, Bebe A, Køster-Rasmussen R, Jarbøl DE, Guassora AD, Waldorff FB, et al. The role of diseases, risk factors and symptoms in the definition of multimorbidity–a systematic review. Scandinavian journal of primary health care. 2016;34(2):112–121. doi: 10.3109/02813432.2016.1153242 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Afshar S, Roderick PJ, Kowal P, Dimitrov BD, Hill AG. Multimorbidity and the inequalities of global ageing: a cross-sectional study of 28 countries using the World Health Surveys. BMC public health. 2015;15(1):1–10. doi: 10.1186/s12889-015-2008-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Garin N, Koyanagi A, Chatterji S, Tyrovolas S, Olaya B, Leonardi M, et al. Global multimorbidity patterns: a cross-sectional, population-based, multi-country study. Journals of Gerontology Series A: Biomedical Sciences and Medical Sciences. 2016;71(2):205–214. doi: 10.1093/gerona/glv128 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Violan C, Foguet-Boreu Q, Flores-Mateo G, Salisbury C, Blom J, Freitag M, et al. Prevalence, determinants and patterns of multimorbidity in primary care: a systematic review of observational studies. PloS one. 2014;9(7):e102149. doi: 10.1371/journal.pone.0102149 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Schiøtz ML, Stockmarr A, Høst D, Glümer C, Frølich A. Social disparities in the prevalence of multimorbidity–A register-based population study. BMC public health. 2017;17(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Sakib MN, Shooshtari S, St John P, Menec V. The prevalence of multimorbidity and associations with lifestyle factors among middle-aged Canadians: an analysis of Canadian Longitudinal Study on Aging data. BMC Public Health. 2019;19(1):1–13. doi: 10.1186/s12889-019-6567-x [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Black JE, Kueper JK, Terry AL, Lizotte DJ. Development of a prognostic prediction model to estimate the risk of multiple chronic diseases: constructing a copula-based model using Canadian primary care electronic medical record data. International journal of population data science. 2021;6(1). doi: 10.23889/ijpds.v6i1.1395 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. He K, Zhang W, Hu X, Zhao H, Guo B, Shi Z, et al. Relationship between multimorbidity, disease cluster and all-cause mortality among older adults: a retrospective cohort analysis. BMC public health. 2021;21(1):1–8. doi: 10.1186/s12889-021-11108-w [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Groves D, Karsanji U, Evans RA, Greening N, Singh SJ, Quint JK, et al. Predicting Future Health Risk in COPD: Differential Impact of Disease-Specific and Multi-Morbidity-Based Risk Stratification. International Journal of Chronic Obstructive Pulmonary Disease. 2021;16:1741. doi: 10.2147/COPD.S303202 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Corsonello A, Soraci L, Di Rosa M, Bustacchini S, Bonfigli AR, Lisa R, et al. Prognostic Interplay of Functional Status and Multimorbidity Among Older Patients Discharged From Hospital. Journal of the American Medical Directors Association. 2021;. [DOI] [PubMed] [Google Scholar]
11. Kańtoch A, Grodzicki T, Wójkowska-Mach J, Heczko P, Gryglewska B. Explanatory survival model for nursing home residents-a 9-year retrospective cohort study. Archives of Gerontology and Geriatrics. 2021;97:104497. doi: 10.1016/j.archger.2021.104497 [DOI] [PubMed] [Google Scholar]
12. Peng LN, Hsiao FY, Lee WJ, Huang ST, Chen LK, et al. Comparisons between hypothesis-and data-driven approaches for multimorbidity frailty index: A machine learning approach. Journal of medical Internet research. 2020;22(6):e16213. doi: 10.2196/16213 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Schiltz NK, Warner DF, Sun J, Bakaki PM, Dor A, Given CW, et al. Identifying specific combinations of multimorbidity that contribute to health care resource utilization: an analytic approach. Medical care. 2017;55(3):276. doi: 10.1097/MLR.0000000000000660 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Yu K, Yang Z, Wu C, Huang Y, Xie X. In-hospital resource utilization prediction from electronic medical records with deep learning. Knowledge-Based Systems. 2021;223:107052. doi: 10.1016/j.knosys.2021.107052 [DOI] [Google Scholar]
15. Zeng X, Lin S, Liu C. Multi-View Deep Learning Framework for Predicting Patient Expenditure in Healthcare. IEEE Open Journal of the Computer Society. 2021;2:62–71. doi: 10.1109/OJCS.2021.3052518 [DOI] [Google Scholar]
16. Ng SK, Tawiah R, Sawyer M, Scuffham P. Patterns of multimorbid health conditions: a systematic review of analytical methods and comparison analysis. International Journal of Epidemiology. 2018;47(5):1687–1704. doi: 10.1093/ije/dyy134 [DOI] [PubMed] [Google Scholar]
17. Held FP, Blyth F, Gnjidic D, Hirani V, Naganathan V, Waite LM, et al. Association rules analysis of comorbidity and multimorbidity: The Concord Health and Aging in Men Project. Journals of Gerontology Series A: Biomedical Sciences and Medical Sciences. 2016;71(5):625–631. doi: 10.1093/gerona/glv181 [DOI] [PubMed] [Google Scholar]
18. Hidalgo CA, Blumm N, Barabási AL, Christakis NA. A dynamic network approach for the study of human phenotypes. PLoS computational biology. 2009;5(4):e1000353. doi: 10.1371/journal.pcbi.1000353 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Lee Y, Kim H, Jeong H, Noh Y. Patterns of Multimorbidity in Adults: An Association Rules Analysis Using the Korea Health Panel. International Journal of Environmental Research and Public Health. 2020;17(8):2618. doi: 10.3390/ijerph17082618 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Birk JL, Kronish IM, Moise N, Falzon L, Yoon S, Davidson KW. Depression and multimorbidity: Considering temporal characteristics of the associations between depression and multiple chronic diseases. Health Psychology. 2019;38(9):802. doi: 10.1037/hea0000737 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Hernández B, Reiley RB, K RA. Investigation of multimorbidity and prevalent disease combinations in older Irish adults using network analysis and association rules. Sci Rep. 2019;9. doi: 10.1038/s41598-019-51135-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Divo MJ, Celli BR, Poblador-Plou B, Calderón-Larrañaga A, de Torres JP, Gimeno-Feliu LA, et al. Chronic Obstructive Pulmonary Disease (COPD) as a disease of early aging: Evidence from the EpiChron Cohort. PloS one. 2018;13(2):e0193143. doi: 10.1371/journal.pone.0193143 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Kalgotra P, Sharda R, Croff JM. Examining health disparities by gender: A multimorbidity network analysis of electronic medical record. International journal of medical informatics. 2017;108:22–28. doi: 10.1016/j.ijmedinf.2017.09.014 [DOI] [PubMed] [Google Scholar]
24. Rafferty J, Watkins A, Lyons J, Lyons RA, Akbari A, Peek N, et al. Ranking sets of morbidities using hypergraph centrality. Journal of Biomedical Informatics. 2021;122:103916. doi: 10.1016/j.jbi.2021.103916 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Larvin H, Kang J, Aggarwal V, Pavitt S, Wu J. Systemic Multimorbidity Clusters in People with Periodontitis. Journal of Dental Research. 2022; p. 00220345221098910. doi: 10.1177/00220345221098910 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Lyons J, Akbari A, Agrawal U, Harper G, Azcoaga-Lorenzo A, Bailey R, et al. Protocol for the development of the Wales Multimorbidity e-Cohort (WMC): data sources and methods to construct a population-based research platform to investigate multimorbidity. BMJ Open. 2021;11(1). doi: 10.1136/bmjopen-2020-047101 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Ford DV, Jones KH, Verplancke JP, Lyons RA, John G, Brown G, et al. The SAIL Databank: building a national architecture for e-health research and evaluation. BMC health services research. 2009;9(1):157. doi: 10.1186/1472-6963-9-157 [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Lyons RA, Jones KH, John G, Brooks CJ, Verplancke JP, Ford DV, et al. The SAIL Databank: linking multiple health and social care datasets. BMC medical informatics and decision making. 2009;9(1):3. doi: 10.1186/1472-6947-9-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Rafferty J, Bennett E, Lee A. multimorbidity_hypergraphs: Release v0.3.1; 2021. Available from: 10.5281/zenodo.5285009. [DOI]
30. Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Medical care. 2005; p. 1130–1139. doi: 10.1097/01.mlr.0000182534.19832.83 [DOI] [PubMed] [Google Scholar]
31. Metcalfe D, Masters J, Delmestri A, Judge A, Perry D, Zogg C, et al. Coding algorithms for defining Charlson and Elixhauser co-morbidities in Read-coded databases. BMC medical research methodology. 2019;19(1):1–9. doi: 10.1186/s12874-019-0753-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Pace M, Lanzieri G, Glickman M, Zupanič T. Revision of the European Standard Population: report of Eurostat’s task force. Publications Office of the European Union; 2013. [Google Scholar]
33. Mino-León D, Reyes-Morales H, Doubova SV, Pérez-Cuevas R, Giraldo-Rodríguez L, Agudelo-Botero M. Multimorbidity patterns in older adults: an approach to the complex interrelationships among chronic diseases. Archives of Medical Research. 2017;48(1):121–127. doi: 10.1016/j.arcmed.2017.03.001 [DOI] [PubMed] [Google Scholar]
34. Ho ISS, Azcoaga-Lorenzo A, Akbari A, Black C, Davies J, Hodgins P, et al. Examining variation in the measurement of multimorbidity in research: a systematic review of 566 studies. The Lancet Public Health. 2021;6(8):e587–e597. doi: 10.1016/S2468-2667(21)00107-9 [DOI] [PubMed] [Google Scholar]
35. Soley-Bori M, Ashworth M, Bisquera A, Dodhia H, Lynch R, Wang Y, et al. Impact of multimorbidity on healthcare costs and utilisation: a systematic review of the UK literature. British Journal of General Practice. 2021;71(702):e39–e46. doi: 10.3399/bjgp20X713897 [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Frølich A, Ghith N, Schiøtz M, Jacobsen R, Stockmarr A. Multimorbidity, healthcare utilization and socioeconomic status: a register-based study in Denmark. PloS one. 2019;14(8):e0214183. doi: 10.1371/journal.pone.0214183 [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Pati S, Swain S, Knottnerus JA, Metsemakers JF, van den Akker M. Magnitude and determinants of multimorbidity and health care utilization among patients attending public versus private primary care: a cross-sectional study from Odisha, India. International journal for equity in health. 2020;19(1):1–12. doi: 10.1186/s12939-020-01170-y [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Monterde D, Vela E, Clèries M, Garcia-Eroles L, Roca J, Pérez-Sust P. Multimorbidity as a predictor of health service utilization in primary care: a registry-based study of the Catalan population. BMC family practice. 2020;21(1):1–9. doi: 10.1186/s12875-020-01104-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Zhao Y, Atun R, Oldenburg B, McPake B, Tang S, Mercer SW, et al. Physical multimorbidity, health service use, and catastrophic health expenditure by socioeconomic groups in China: an analysis of population-based panel data. The Lancet Global Health. 2020;8(6):e840–e849. doi: 10.1016/S2214-109X(20)30127-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0295300.r001

Decision Letter 0

Ratna Dwi Wulandari

15 Nov 2022

PONE-D-22-20775Using hypergraphs to quantify importance of sets of diseases by healthcare resource utilisation: A retrospective cohort studyPLOS ONE

Dear Dr. Rafferty,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

ACADEMIC EDITOR:Please share in the background section to show that this research is essential and has a specificity that distinguishes it from previous research.

Added an explanation in the method section.

Improve the writing of results and deepen the discussion and conclusions.Take notice of all reviewer suggestions.

Please submit your revised manuscript by Dec 30 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ratna Dwi Wulandari, Dr

Guest Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information.

3. Please could you clarify within the ethics statement whether the 'Information Governance Review Panel of Swansea University' is the same committee as the Research Ethics - Swansea University.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

This work was supported by the Medical Research Council (MRC), grant no. 232

MR/S027750/1. FJ is supported by a MRC/University of Manchester Skills 233

Development Fellowship (grant number MR/R016615). The funders had no role in 234

study design, data collection and analysis, decision to publish, or preparation of the 235

manuscript.

However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

"Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

6. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well.

7. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

8. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

This post is exciting and well-written. However, here are some areas that need improvement.

Background.

The author explains that the most frequently used approach so far in multi-morbidity investigations is statistical modeling and unsupervised machine learning. Before going into the explanation of hypergraphs, the author should first explain what are the weaknesses of statistical modeling and unsupervised machine learning, so it is necessary to do research with hypergraphs.

The author also needs to add examples of the advantages of using hypergraphs based on previous research.

Writers should write down the research objectives clearly.

Method

Page 2 line 46, the first sentence is confusing, because the author wrote down the cohort used for this research has been explained before. It will be easier for readers to understand if the writer starts by explaining why they use cohort data. The author also has not explained well how to get cohort data. How does the author control the quality of the data?

There should be a more detailed explanation regarding the utilization of health resources.

Discussion

The author needs to elaborate on more literature and previous research results so that the discussion is more in-depth and interesting.

Try doing the ending separately.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors clearly showed different approaches that have been used to analysed multi-morbidity outcome data citing examples. The further explain how different and noble their methods can be also adopted to analyse a different dimension of such multi-morbid outcome data.

For table 1 it will be best if the diseases are sorted either in ascending or descending order based on the percentages. The current arrangement does not seem to follow any reason order.

Given that hypertension seems to be the most prevailing condition, perhaps, the authors should also consider in future work a similar approach where they focus attention on multimorbidity among patients with hypertension. This will highlight the most important co-morbid conditions related to hypertension

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Dec 15;18(12):e0295300. doi: 10.1371/journal.pone.0295300.r002

Author response to Decision Letter 0

25 Jan 2023

Thank you for reviewing our paper. We have attached a full response to editor and referee comments to this submission as a file attachment. Please see that file for a point by point response to referee and editor comments.

Attachment

Submitted filename: plosone_referee_response.docx

Click here for additional data file.^{(27.9KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0295300.r003

Decision Letter 1

Ratna Dwi Wulandari

17 Apr 2023

PONE-D-22-20775R1Using hypergraphs to quantify importance of sets of diseases by healthcare resource utilisation: A retrospective cohort studyPLOS ONE

Dear Dr. Rafferty,

Please submit your revised manuscript by Jun 01 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Ratna Dwi Wulandari, Dr

Guest Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Here are some suggestions for improving your manuscript based on the second reviewer's comments.

This paper deals with an important problem. How do we deal with the thorny problem of multimorbidity - it is a high dimensional problem. They present a dimension reduction technique, the use of hypergraphs. The data used is suitable for addressing this question. It is good that code is provided.

However, the problem is that, after reading the paper, I am none the wiser as to the role of hypergraphs. I can't interpret figure 2 and am not really sure what the algorithm is doing or what the significance of that is. I cannot bridge the gap between the technical explanation and what is presented. The method is so unfamiliar, and it is the method that is the novel thing here, that I think the authors need to take us through it, almost in the form of a tutorial. I appreciate that this is challenging, but I think we need much more help as readers to understand what is being proposed. Could the authors bring in simulated examples, or examples from other fields?

1. I am afraid that I cannot follow what is meant in the background in the paragraph starting "Research". For example what does "This symmetry allows one to construct the dual hypergraph" mean? This approach is much less familiar to readers than is regression modelling. I think we need some examples and we need to get some kind of intuition or feel for what the algorithm is doing and what it means.

2. I don't think the term "important" is helpful here either when discussing prevalence of HRUs or their combination. Could it instead talk about the most common and most costly? For their combination some other term would be useful. Maybe most "prominent"?

3. The text on Figure 2 is too small. More importantly, we need some text to talk us through the figure to help explain it.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: I Don't Know

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #2: This paper deals with an important problem. How do we deal with the thorny problem of multimorbidity - it is a high dimensional problem. They present a dimension reduction technique, the use of hypergraphs. The data used is suitable for addressing this question. It is good that code is provided.

3. The text on Figure 2 is too small. More importantly, we need some text to talk us through the figure to help explain it.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: David A McAllister

**********

PLoS One. 2023 Dec 15;18(12):e0295300. doi: 10.1371/journal.pone.0295300.r004

Author response to Decision Letter 1

2 Jun 2023

Please find a response to the reviewers comments as an attached file.

Attachment

Submitted filename: plosone_referee_response_2_final.docx

Click here for additional data file.^{(18.6KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0295300.r005

Decision Letter 2

Ratna Dwi Wulandari

21 Jun 2023

PONE-D-22-20775R2Using hypergraphs to quantify importance of sets of diseases by healthcare resource utilisation: A retrospective cohort studyPLOS ONE

Dear Dr. Rafferty,

Please submit your revised manuscript by Aug 05 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Ratna Dwi Wulandari, Dr

Guest Editor

PLOS ONE

Journal Requirements:

Additional Editor Comments:

The reviewers still provide some comments in this third round, so the writer needs to improve the manuscript again, following the reviewer's comments.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: No

**********

6. Review Comments to the Author

Reviewer #2: Thanks for highlighting the Rafferty et al paper (ref 24) using hypergraphs for multimorbidity. The explanations in that were great. It also gave a nice intuition for the concept of the centrality metric and the example of Google's algorithm was very helpful. It would be good to make that paper more prominent in the background.

I agree that in the light of the Rafferty et al paper it is not necessary to write a tutorial-type paper here. However, I think this paper still needs a bit more to give the reader an intuitive understand both networks and centrality as a measure. Something analogous to the papers that reported network meta-analyses before this became a common approach.

I think the paper needs something like the following in the introduction (not my area so the following likely to be wrong, but to give the tone):-

- Describing and summarising multimorbidity is challenging because ...

- approaches that have been used include simple counts, weighted counts, clustering algorithms, and multi-state models (as per current intro)

- Network based approaches are promising because in describing sets of conditions, they allow us to account for the prevalence of individual conditions as well as their relations to other conditions, with varying degrees of commonness. We used a network approach in a previous original article and tutorial paper (24)

- we used hypergraphs for that article rather than simpler binary graphs because XXX and generated summary measures of XXX called centrality. Centrality measures are good because XXX [some intuitive description in the intro]

- Prevalence, is only one way to calculate ... it is possible to weight by any continuous characteristic, and doing so can give different insights into conditions and their relations.

- We now do so for healthcare resource utilisation, which like prevalence is a very important ...

As an aside, I think having a background and an introduction is overkill. It would be better to collapse these into a single section.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: David A McAllister

**********

PLoS One. 2023 Dec 15;18(12):e0295300. doi: 10.1371/journal.pone.0295300.r006

Author response to Decision Letter 2

10 Jul 2023

Attachment

Submitted filename: plosone_referee_response_3.docx

Click here for additional data file.^{(15.4KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0295300.r007

Decision Letter 3

Ratna Dwi Wulandari

21 Nov 2023

Using hypergraphs to quantify importance of sets of diseases by healthcare resource utilisation: A retrospective cohort study

PONE-D-22-20775R3

Dear Dr. Rafferty,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ratna Dwi Wulandari, Dr

Guest Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0295300.r008

Acceptance letter

Ratna Dwi Wulandari

7 Dec 2023

PONE-D-22-20775R3

Using hypergraphs to quantify importance of sets of diseases by healthcare resource utilisation: A retrospective cohort study

Dear Dr. Rafferty:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Ratna Dwi Wulandari

Guest Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. Prevalence hypergraph most important disease sets.

The one hundred most central sets of diseases based on prevalence weighting.

(CSV)

Click here for additional data file.^{(10.3KB, csv)}

S2 File. Unplanned inpatients hypergraph most important disease sets.

The one hundred most central disease sets from a hypergraph weighted using the number of unplanned inpatient visits.

(CSV)

Click here for additional data file.^{(15.9KB, csv)}

S3 File. Outpatients hypergraph most important disease sets.

The one hundred most central disease sets from a hypergraph weighted using the number of outpatient visits.

(CSV)

Click here for additional data file.^{(15.8KB, csv)}

S4 File. Prevalence and outpatients HRU most important disease sets.

The one hundred most important diseases from the prevalence and outpatients HRU hypergraphs combined into a single importance score.

(CSV)

Click here for additional data file.^{(7.2KB, csv)}

S5 File. Prevalence and unplanned inpatients HRU most important disease sets.

The one hundred most important diseases from the prevalence and unplanned inpatients HRU hypergraphs combined into a single importance score.

(CSV)

Click here for additional data file.^{(10.1KB, csv)}

S6 File. Prevalence, unplanned inpatient and outpatients HRU most important disease sets.

The one hundred most important diseases from the unplanned inpatient, prevalence and outpatients HRU hypergraphs combined into a single importance score.

(CSV)

Click here for additional data file.^{(9.1KB, csv)}

Attachment

Submitted filename: plosone_referee_response.docx

Click here for additional data file.^{(27.9KB, docx)}

Attachment

Submitted filename: plosone_referee_response_2_final.docx

Click here for additional data file.^{(18.6KB, docx)}

Attachment

Submitted filename: plosone_referee_response_3.docx

Click here for additional data file.^{(15.4KB, docx)}

Data Availability Statement

[pone.0295300.ref001] 1. Willadsen TG, Bebe A, Køster-Rasmussen R, Jarbøl DE, Guassora AD, Waldorff FB, et al. The role of diseases, risk factors and symptoms in the definition of multimorbidity–a systematic review. Scandinavian journal of primary health care. 2016;34(2):112–121. doi: 10.3109/02813432.2016.1153242 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref002] 2. Afshar S, Roderick PJ, Kowal P, Dimitrov BD, Hill AG. Multimorbidity and the inequalities of global ageing: a cross-sectional study of 28 countries using the World Health Surveys. BMC public health. 2015;15(1):1–10. doi: 10.1186/s12889-015-2008-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref003] 3. Garin N, Koyanagi A, Chatterji S, Tyrovolas S, Olaya B, Leonardi M, et al. Global multimorbidity patterns: a cross-sectional, population-based, multi-country study. Journals of Gerontology Series A: Biomedical Sciences and Medical Sciences. 2016;71(2):205–214. doi: 10.1093/gerona/glv128 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref004] 4. Violan C, Foguet-Boreu Q, Flores-Mateo G, Salisbury C, Blom J, Freitag M, et al. Prevalence, determinants and patterns of multimorbidity in primary care: a systematic review of observational studies. PloS one. 2014;9(7):e102149. doi: 10.1371/journal.pone.0102149 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref005] 5. Schiøtz ML, Stockmarr A, Høst D, Glümer C, Frølich A. Social disparities in the prevalence of multimorbidity–A register-based population study. BMC public health. 2017;17(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref006] 6. Sakib MN, Shooshtari S, St John P, Menec V. The prevalence of multimorbidity and associations with lifestyle factors among middle-aged Canadians: an analysis of Canadian Longitudinal Study on Aging data. BMC Public Health. 2019;19(1):1–13. doi: 10.1186/s12889-019-6567-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref007] 7. Black JE, Kueper JK, Terry AL, Lizotte DJ. Development of a prognostic prediction model to estimate the risk of multiple chronic diseases: constructing a copula-based model using Canadian primary care electronic medical record data. International journal of population data science. 2021;6(1). doi: 10.23889/ijpds.v6i1.1395 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref008] 8. He K, Zhang W, Hu X, Zhao H, Guo B, Shi Z, et al. Relationship between multimorbidity, disease cluster and all-cause mortality among older adults: a retrospective cohort analysis. BMC public health. 2021;21(1):1–8. doi: 10.1186/s12889-021-11108-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref009] 9. Groves D, Karsanji U, Evans RA, Greening N, Singh SJ, Quint JK, et al. Predicting Future Health Risk in COPD: Differential Impact of Disease-Specific and Multi-Morbidity-Based Risk Stratification. International Journal of Chronic Obstructive Pulmonary Disease. 2021;16:1741. doi: 10.2147/COPD.S303202 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref010] 10. Corsonello A, Soraci L, Di Rosa M, Bustacchini S, Bonfigli AR, Lisa R, et al. Prognostic Interplay of Functional Status and Multimorbidity Among Older Patients Discharged From Hospital. Journal of the American Medical Directors Association. 2021;. [DOI] [PubMed] [Google Scholar]

[pone.0295300.ref011] 11. Kańtoch A, Grodzicki T, Wójkowska-Mach J, Heczko P, Gryglewska B. Explanatory survival model for nursing home residents-a 9-year retrospective cohort study. Archives of Gerontology and Geriatrics. 2021;97:104497. doi: 10.1016/j.archger.2021.104497 [DOI] [PubMed] [Google Scholar]

[pone.0295300.ref012] 12. Peng LN, Hsiao FY, Lee WJ, Huang ST, Chen LK, et al. Comparisons between hypothesis-and data-driven approaches for multimorbidity frailty index: A machine learning approach. Journal of medical Internet research. 2020;22(6):e16213. doi: 10.2196/16213 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref013] 13. Schiltz NK, Warner DF, Sun J, Bakaki PM, Dor A, Given CW, et al. Identifying specific combinations of multimorbidity that contribute to health care resource utilization: an analytic approach. Medical care. 2017;55(3):276. doi: 10.1097/MLR.0000000000000660 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref014] 14. Yu K, Yang Z, Wu C, Huang Y, Xie X. In-hospital resource utilization prediction from electronic medical records with deep learning. Knowledge-Based Systems. 2021;223:107052. doi: 10.1016/j.knosys.2021.107052 [DOI] [Google Scholar]

[pone.0295300.ref015] 15. Zeng X, Lin S, Liu C. Multi-View Deep Learning Framework for Predicting Patient Expenditure in Healthcare. IEEE Open Journal of the Computer Society. 2021;2:62–71. doi: 10.1109/OJCS.2021.3052518 [DOI] [Google Scholar]

[pone.0295300.ref016] 16. Ng SK, Tawiah R, Sawyer M, Scuffham P. Patterns of multimorbid health conditions: a systematic review of analytical methods and comparison analysis. International Journal of Epidemiology. 2018;47(5):1687–1704. doi: 10.1093/ije/dyy134 [DOI] [PubMed] [Google Scholar]

[pone.0295300.ref017] 17. Held FP, Blyth F, Gnjidic D, Hirani V, Naganathan V, Waite LM, et al. Association rules analysis of comorbidity and multimorbidity: The Concord Health and Aging in Men Project. Journals of Gerontology Series A: Biomedical Sciences and Medical Sciences. 2016;71(5):625–631. doi: 10.1093/gerona/glv181 [DOI] [PubMed] [Google Scholar]

[pone.0295300.ref018] 18. Hidalgo CA, Blumm N, Barabási AL, Christakis NA. A dynamic network approach for the study of human phenotypes. PLoS computational biology. 2009;5(4):e1000353. doi: 10.1371/journal.pcbi.1000353 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref019] 19. Lee Y, Kim H, Jeong H, Noh Y. Patterns of Multimorbidity in Adults: An Association Rules Analysis Using the Korea Health Panel. International Journal of Environmental Research and Public Health. 2020;17(8):2618. doi: 10.3390/ijerph17082618 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref020] 20. Birk JL, Kronish IM, Moise N, Falzon L, Yoon S, Davidson KW. Depression and multimorbidity: Considering temporal characteristics of the associations between depression and multiple chronic diseases. Health Psychology. 2019;38(9):802. doi: 10.1037/hea0000737 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref021] 21. Hernández B, Reiley RB, K RA. Investigation of multimorbidity and prevalent disease combinations in older Irish adults using network analysis and association rules. Sci Rep. 2019;9. doi: 10.1038/s41598-019-51135-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref022] 22. Divo MJ, Celli BR, Poblador-Plou B, Calderón-Larrañaga A, de Torres JP, Gimeno-Feliu LA, et al. Chronic Obstructive Pulmonary Disease (COPD) as a disease of early aging: Evidence from the EpiChron Cohort. PloS one. 2018;13(2):e0193143. doi: 10.1371/journal.pone.0193143 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref023] 23. Kalgotra P, Sharda R, Croff JM. Examining health disparities by gender: A multimorbidity network analysis of electronic medical record. International journal of medical informatics. 2017;108:22–28. doi: 10.1016/j.ijmedinf.2017.09.014 [DOI] [PubMed] [Google Scholar]

[pone.0295300.ref024] 24. Rafferty J, Watkins A, Lyons J, Lyons RA, Akbari A, Peek N, et al. Ranking sets of morbidities using hypergraph centrality. Journal of Biomedical Informatics. 2021;122:103916. doi: 10.1016/j.jbi.2021.103916 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref025] 25. Larvin H, Kang J, Aggarwal V, Pavitt S, Wu J. Systemic Multimorbidity Clusters in People with Periodontitis. Journal of Dental Research. 2022; p. 00220345221098910. doi: 10.1177/00220345221098910 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref026] 26. Lyons J, Akbari A, Agrawal U, Harper G, Azcoaga-Lorenzo A, Bailey R, et al. Protocol for the development of the Wales Multimorbidity e-Cohort (WMC): data sources and methods to construct a population-based research platform to investigate multimorbidity. BMJ Open. 2021;11(1). doi: 10.1136/bmjopen-2020-047101 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref027] 27. Ford DV, Jones KH, Verplancke JP, Lyons RA, John G, Brown G, et al. The SAIL Databank: building a national architecture for e-health research and evaluation. BMC health services research. 2009;9(1):157. doi: 10.1186/1472-6963-9-157 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref028] 28. Lyons RA, Jones KH, John G, Brooks CJ, Verplancke JP, Ford DV, et al. The SAIL Databank: linking multiple health and social care datasets. BMC medical informatics and decision making. 2009;9(1):3. doi: 10.1186/1472-6947-9-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref029] 29.Rafferty J, Bennett E, Lee A. multimorbidity_hypergraphs: Release v0.3.1; 2021. Available from: 10.5281/zenodo.5285009. [DOI]

[pone.0295300.ref030] 30. Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Medical care. 2005; p. 1130–1139. doi: 10.1097/01.mlr.0000182534.19832.83 [DOI] [PubMed] [Google Scholar]

[pone.0295300.ref031] 31. Metcalfe D, Masters J, Delmestri A, Judge A, Perry D, Zogg C, et al. Coding algorithms for defining Charlson and Elixhauser co-morbidities in Read-coded databases. BMC medical research methodology. 2019;19(1):1–9. doi: 10.1186/s12874-019-0753-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref032] 32. Pace M, Lanzieri G, Glickman M, Zupanič T. Revision of the European Standard Population: report of Eurostat’s task force. Publications Office of the European Union; 2013. [Google Scholar]

[pone.0295300.ref033] 33. Mino-León D, Reyes-Morales H, Doubova SV, Pérez-Cuevas R, Giraldo-Rodríguez L, Agudelo-Botero M. Multimorbidity patterns in older adults: an approach to the complex interrelationships among chronic diseases. Archives of Medical Research. 2017;48(1):121–127. doi: 10.1016/j.arcmed.2017.03.001 [DOI] [PubMed] [Google Scholar]

[pone.0295300.ref034] 34. Ho ISS, Azcoaga-Lorenzo A, Akbari A, Black C, Davies J, Hodgins P, et al. Examining variation in the measurement of multimorbidity in research: a systematic review of 566 studies. The Lancet Public Health. 2021;6(8):e587–e597. doi: 10.1016/S2468-2667(21)00107-9 [DOI] [PubMed] [Google Scholar]

[pone.0295300.ref035] 35. Soley-Bori M, Ashworth M, Bisquera A, Dodhia H, Lynch R, Wang Y, et al. Impact of multimorbidity on healthcare costs and utilisation: a systematic review of the UK literature. British Journal of General Practice. 2021;71(702):e39–e46. doi: 10.3399/bjgp20X713897 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref036] 36. Frølich A, Ghith N, Schiøtz M, Jacobsen R, Stockmarr A. Multimorbidity, healthcare utilization and socioeconomic status: a register-based study in Denmark. PloS one. 2019;14(8):e0214183. doi: 10.1371/journal.pone.0214183 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref037] 37. Pati S, Swain S, Knottnerus JA, Metsemakers JF, van den Akker M. Magnitude and determinants of multimorbidity and health care utilization among patients attending public versus private primary care: a cross-sectional study from Odisha, India. International journal for equity in health. 2020;19(1):1–12. doi: 10.1186/s12939-020-01170-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref038] 38. Monterde D, Vela E, Clèries M, Garcia-Eroles L, Roca J, Pérez-Sust P. Multimorbidity as a predictor of health service utilization in primary care: a registry-based study of the Catalan population. BMC family practice. 2020;21(1):1–9. doi: 10.1186/s12875-020-01104-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0295300.ref039] 39. Zhao Y, Atun R, Oldenburg B, McPake B, Tang S, Mercer SW, et al. Physical multimorbidity, health service use, and catastrophic health expenditure by socioeconomic groups in China: an analysis of population-based panel data. The Lancet Global Health. 2020;8(6):e840–e849. doi: 10.1016/S2214-109X(20)30127-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Using hypergraphs to quantify importance of sets of diseases by healthcare resource utilisation: A retrospective cohort study

James Rafferty

Alexandra Lee

Ronan A Lyons

Ashley Akbari

Niels Peek

Farideh Jalali-najafabadi

Thamer Ba Dhafari

Jane Lyons

Alan Watkins

Rowena Bailey

Roles

Abstract

Introduction

Methods

Cohort

Hypergraph data

Further analysis

Results

Table 1. The fraction of people diagnosed with the feature diseases.

Fig 1. The frequency of the number of interactions with healthcare services.

Fig 2. The centrality of the overlap coefficient weighted hypergraph (x-axis) plotted against the centrality of the HRU hypergraph (y-axis).

Fig 3. The number of times each single disease appeared in the most important sets of diseases based on combined centrality for overlap coefficient and HRU weighted hypergraphs.

Discussion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Ratna Dwi Wulandari

Roles

Author response to Decision Letter 0

Decision Letter 1

Ratna Dwi Wulandari

Roles

Author response to Decision Letter 1

Decision Letter 2

Ratna Dwi Wulandari

Roles

Author response to Decision Letter 2

Decision Letter 3

Ratna Dwi Wulandari

Roles

Acceptance letter

Ratna Dwi Wulandari

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases