Abstract
Objective
This study performed individual‐centric, data‐driven calculations of propensity for coronary heart disease (CHD) and type 2 diabetes (T2D), utilizing magnetic resonance imaging‐acquired body composition measurements, for sub‐phenotyping of obesity and nonalcoholic fatty liver disease (NAFLD).
Methods
A total of 10,019 participants from the UK Biobank imaging substudy were included and analyzed for visceral and abdominal subcutaneous adipose tissue, muscle fat infiltration, and liver fat. An adaption of the k‐nearest neighbors algorithm was applied to the imaging variable space to calculate individualized CHD and T2D propensity and explore metabolic sub‐phenotyping within obesity and NAFLD.
Results
The ranges of CHD and T2D propensity for the whole cohort were 1.3% to 58.0% and 0.6% to 42.0%, respectively. The diagnostic performance, area under the receiver operating characteristic curve (95% CI), using disease propensities for CHD and T2D detection was 0.75 (0.73‐0.77) and 0.79 (0.77‐0.81). Exploring individualized disease propensity, CHD phenotypes, T2D phenotypes, comorbid phenotypes, and metabolically healthy phenotypes were found within obesity and NAFLD.
Conclusions
The adaptive k‐nearest neighbors algorithm allowed an individual‐centric assessment of each individual’s metabolic phenotype moving beyond discrete categorizations of body composition. Within obesity and NAFLD, this may help in identifying which comorbidities a patient may develop and consequently enable optimization of treatment.
Introduction
Biobanks have become an important resource in medical research and personalized medicine. The number of biobank studies including advanced imaging techniques, such as magnetic resonance imaging (MRI), is growing larger 1, 2, 3, 4, enabling extraction of standardized noninvasive biomarkers describing body composition. With a large number of participants enrolled, a population‐based analysis is tractable, yet the great detail with which individuals are described makes highly individual‐centric analysis approaches possible. The transition to personalized medicine requires networking across disciplines, whereby detailed assessment of body composition could greatly enhance our understanding of obesity and metabolic health.
Among adiposity‐related biomarkers currently used for health evaluation, BMI is recommended to identify individuals at an elevated risk of coronary heart disease (CHD) and type 2 diabetes (T2D) 5. While a higher BMI correlates with future health risks and predicts morbidity and death on a population scale, it is a poor descriptor of the individual’s health status. Anthropometric measurements, such as BMI, roughly group individuals with a similar body size. However, the use of discrete categories (e.g., underweight, normal weight, overweight, obesity, morbid obesity, or even low or high liver fat) to describe the individual should be questioned. A broader category increases the likelihood of grouping individuals with others of little resemblance to them, while giving the converse impression. Furthermore, it has been shown that specific body fat distributions are significantly linked to adverse outcomes 6, 7, 8, 9, something BMI fails to capture. The aim of this study was (1) to perform individual‐centric, data‐driven disease predictions, utilizing multiple standardized body composition measurements, to explore metabolic sub‐phenotyping in the UK Biobank Imaging Study and (2) to investigate the range of sub‐phenotypes in individuals with obesity and nonalcoholic fatty liver disease (NAFLD).
Methods
Study design and study population
This research has been conducted using the UK Biobank Resource Project ID 6569. The UK Biobank is a population‐based biobank study, begun in 2006, that has followed 502,682 participants 4 and was approved by the North West Multicenter Research Ethics Committee. Written informed consent was given prior to study entry.
The first 10,019 participants from the cross‐sectional imaging substudy were included. The participants’ MRI scans were analyzed for visceral adipose tissue (VAT), abdominal subcutaneous adipose tissue (SAT), thigh muscle volumes, muscle fat infiltration (MFI) in the anterior thighs, and liver proton density fat fraction (PDFF) 10, 11, 12, 13, 14. Following visual inspection of those with high MFI or left‐right asymmetry, the MFI values of seven individuals with suggestive neuromuscular disease were excluded.
Diagnosis information was gathered through inpatient electronic health care records (Hospital Episode Statistics downloaded November 2016, available from 1995‐2015) and questionnaires followed by interviews performed by trained nurses. Participants were categorized as CHD case, control, or status unknown and T2D case, control, or status unknown (definitions in online Supporting Information).
Individual‐centric disease predictions based on body composition
One adaption out of several variations, which have been developed to improve the k‐nearest neighbors (kNN) algorithm 15, 16 to suit particular situations, was used to produce predictions based on local neighborhoods of observations. Neighborhoods were gender specific and defined by VAT index (VATi; VAT divided by height squared) to compensate for participant size 17, abdominal SAT index (aSATi; abdominal SAT divided by height squared), liver PDFF, and MFI. Normalization was performed using the Euclidean distance. This set of variables was chosen to give a broad description of adipose tissue and ectopic fat throughout the body while describing the distribution of adipose tissue in the body without physical overlap.
To adapt to the degree of sparseness in the imaging variable space and disease data, an additional parameter was included: the local neighborhood had to satisfy both a minimum neighborhood size (k = 50) and a minimum number of diseased individuals (n = 20). If a neighborhood of size k = 50 did not include 20 diseased individuals, it was increased to include a sufficient amount of disease data (n = 20 cases) for disease prediction. The prediction value was the percentage of CHD or T2D cases within the respective neighborhoods of each individual. This method will be referred to as the adaptive kNN (A‐kNN), the corresponding disease prediction values as “CHD propensity” and “T2D propensity,” and the individuals included in the adapted neighborhood as the individual’s “virtual control group.” Figure 1 provides a visual description of the A‐kNN algorithm.
Figure 1.
Visual description of the A‐kNN algorithm with neighborhood requirements of minimum size k = 10 and minimum number of diseased participants (disease cases) n = 4 in a two‐dimensional imaging variable space. Parameters k = 10 and n = 4 used to allow simplified visualization. (1) Create a virtual control group including the k = 10 participants closest to the individual (dashed circle). (2) If these do not include n = 4 disease cases, increase the neighborhood size to include n = 4 disease cases (solid circle). (3) The disease prevalence in the resulting virtual control group (including both disease cases and disease controls) gives the disease propensity. [Colour figure can be viewed at wileyonlinelibrary.com]
To produce measures of uncertainty around the propensity values, the nonparametric bootstrap 18, 19 was used. The prediction model was trained to each bootstrap data set and applied to the full data set to produce an associated set of bootstrap predictions, one for each bootstrap data set (N = 500). For each propensity value, the standard deviation (SD) was calculated as a measure of prediction uncertainty.
Only individuals with known disease status and no missing values for VATi, aSATi, liver PDFF, or MFI were used for prediction.
Receiver operating characteristic (ROC) analyses for disease detection were performed using CHD and T2D propensity. The area under the ROC curve (AUROC) with 95% confidence interval (CI) was calculated and compared with AUROC using BMI for prediction for reference.
Assessment of A‐kNN algorithm
Methods comparison
The A‐kNN algorithm was compared with global logistic regression 20 (logistic regression denoted “global” to differentiate it from local logistic regression), local logistic regression 21, classic kNN algorithm 15, 16, and a second modification of the kNN algorithm in which a regression‐based estimation using the participants in the neighborhood was performed 15. Disease predictions were made with each of the body composition profile (BCP) variables (VATi, aSATi, liver PDFF, and MFI) as dependent variables separately.
Diagnostic performance evaluation
The “full data set” (N = 10,019) was randomly split into a “training data set” (N = 6,679) and a “test data set” (N = 3,340). CHD and T2D propensities were calculated using the A‐kNN algorithm for each participant in the test data set using the training data set. ROC analysis was performed using the resulting disease propensities, and actual outcome was recorded in the test data set. AUROC was compared with those calculated without using separate data sets.
Exploration of individualized disease associations and phenotypes
Disease propensity values were visualized in a two‐dimensional plot, and six subgroups spanning the space were visually selected for illustration of body composition variations. Body composition variations were illustrated using the BCP plot, previously described by Linge et al. 9 and exemplified in Supporting Information Figure S1. Differences in ectopic fat variables (VATi, liver PDFF, MFI) between subgroups were tested using the Mann‐Whitney U test.
Metabolic sub‐phenotyping in individuals with obesity and fatty liver (liver PDFF > 5%) were explored by visualization of disease propensities and investigation of diagnostic performance. The population with fatty liver was further stratified for NAFLD using alcohol consumption. Thresholds were 14 and 21 units per week for females and males, respectively 22. Online Supporting Information includes a description of the calculation of alcohol units per week.
Computations were performed using the R language for statistical computing and graphics (R Foundation for Statistical Computing, Vienna, Austria). Nearest neighbor methods were implemented in house. Base or contributed R packages were used to implement regression models and hypothesis tests. Online Supporting Information lists the packages used.
Results
Table 1 summarizes characteristics of the cohort (52% female; mean age 62.6 [SD 7.5] [44.5‐79.3] years; mean BMI 26.7 [SD 4.4] [14.2‐58.0] kg/m2) and disease groups (CHD prevalence 4.7%; T2D prevalence 4.5%). Figure 2 presents disease propensity results. The range of CHD and T2D propensity values for the whole cohort were 1.3% to 58.0% and 0.6% to 42.0%, respectively. High T2D propensity was not directly associated with high CHD propensity: some participants had more than twice as high a T2D propensity compared with CHD propensity, and some participants with CHD propensities between 30% and 40% had T2D propensities below 10%. Females had a higher density of low propensity values, and the male population exhibited higher CHD propensity values overall. Supporting Information Figure S5 presents gender‐specific results.
Table 1.
Summary of characteristics of full cohort
All | Females | Males | CHD | T2D | |
---|---|---|---|---|---|
N, participants | 10,019 | 5,202 | 4,817 | 472 | 455 |
Age, y | 62.60 (7.51) | 61.90 (7.35) | 63.36 (7.61) | 67.21 (6.12) | 65.45 (6.82) |
Weight, kg | 75.86 (15.14) | 68.67 (12.92) | 83.63 (13.42) | 82.15 (15.95) | 87.77 (16.47) |
BMI, kg/m2 | 26.67 (4.40) | 26.24 (4.75) | 27.13 (3.94) | 28.46 (4.76) | 30.30 (5.38) |
Waist circumference, cm | 87.47 (12.18) | 81.92 (11.32) | 93.47 (10.03) | 93.93 (11.83) | 98.95 (12.38) |
CHD prevalence (cases/controls/unknown) | 472/6,178/3,369 | 141/3,553/1,508 | 331/6,178/1,861 | 472/–/– | 70/0/385 |
T2D prevalence (cases/controls/unknown) | 455/9,424/140 | 149/4,972/81 | 306/9,424/59 | 70/390/12 | 455/–/– |
CHD propensity, % [range] | 7.79 (7.30) [1.31‐58.0] | 3.85 (3.21) [1.31‐20.83] | 12.04 (8.07) [3.86‐58.0] | 13.24 (10.24) [1.36‐52.0] | 15.22 (10.77) [1.46‐58.0] |
T2D propensity, % [range] | 4.33 (5.16) [0.68‐42.0] | 2.59 (3.63) [0.68‐38.46] | 6.2 (5.87) [1.48‐42.0] | 7.48 (7.19) [0.71‐35.09] | 10.45 (8.42) [0.70‐40.0] |
Visceral adipose tissue, L | 3.72 (2.24) | 2.63 (1.51) | 4.89 (2.31) | 5.19 (2.69) | 5.97 (2.59) |
Abdominal subcutaneous adipose tissue, L | 7.02 (3.20) | 8.05 (3.41) | 5.91 (2.52) | 7.37 (3.41) | 8.60 (3.96) |
Thigh muscle volume, L | 10.33 (2.56) | 8.35 (1.18) | 12.52 (1.77) | 10.85 (2.36) | 11.09 (2.33) |
Weight‐to‐muscle ratio, kg/L | 7.50 (1.32) | 8.25 (1.24) | 6.68 (0.83) | 7.67 (1.35) | 8.03 (1.57) |
Liver proton density fat fraction, % | 4.16 (1.50‐4.61) | 3.65 (1.34‐3.74) | 4.71 (1.75‐5.71) | 3.05 (1.71‐6.63) | 6.24 (2.77‐11.69) |
Fat ratio, % | 49.37 (11.30) | 53.82 (10.59) | 44.44 (9.94) | 51.80 (11.26) | 55.19 (10.23) |
Visceral adipose tissue index, L/m2 | 1.27 (0.72) | 1.00 (0.57) | 1.58 (0.74) | 1.76 (0.86) | 2.03 (0.84) |
Abdominal subcutaneous adipose tissue index, L/m2 | 2.50 (1.23) | 3.04 (1.3) | 1.90 (0.80) | 2.58 (1.31) | 3.00 (1.51) |
Total abdominal adipose tissue index, L/m2 | 3.77 (1.64) | 4.04 (1.78) | 3.48 (1.42) | 4.34 (1.82) | 5.03 (1.93) |
Muscle fat infiltration, % | 7.41 (1.86) | 7.93 (1.84) | 6.84 (1.71) | 8.14 (2.25) | 8.62 (2.37) |
Data given as mean (SD). For liver proton density fat fraction, median and interquartile range are shown.
Figure 2.
Scatterplot of propensity for coronary heart disease (CHD) and type 2 diabetes (T2D) (calculated using the adaptive kNN [A‐kNN]) for the full data set (N = 10,019) with disease propensity distributions alongside each corresponding axis. [Colour figure can be viewed at wileyonlinelibrary.com]
The diagnostic performance was greater using disease propensities as predictors compared with using BMI for CHD and T2D detection. Supporting Information Table S1 presents AUROC values.
Assessment of A‐kNN algorithm
Methods comparison
Figure 3 illustrates the method comparison (using MFI and liver PDFF for females). The A‐kNN algorithm followed the local regression trend. When body composition data were dense, it varied more closely with the regression within kNN, and when data were sparse, it was more similar to the fixed kNN, not extrapolating disease associations to high or low prediction values. Supporting Information Figures S2‐S4 present results for VATi and aSATi for females, as well as MFI, liver PDFF, VATi, and aSATi for males.
Figure 3.
Comparison of methods for disease predictions on the female population made for coronary heart disease Pr(CHD) and type 2 diabetes Pr(T2D) with muscle fat infiltration (MFI) in the anterior thighs and liver proton density fat fraction (PDFF) as dependent variables separately.
Diagnostic performance evaluation
AUROC (95% CI) for CHD detection based on CHD propensity using test and training data sets was 0.75 (0.71‐0.79) for all participants, 0.73 (0.65‐0.81) for females, and 0.68 (0.63‐0.74) for males versus 0.75 (0.73‐0.77), 0.73 (0.69‐0.78), and 0.69 (0.66‐0.72) without using separate data sets. Corresponding values for T2D detection were 0.78 (0.74‐0.82) for all participants, 0.84 (0.79‐0.89) for females, and 0.71 (0.66‐0.77) for males versus 0.79 (0.77‐0.81), 0.81 (0.77‐0.85), and 0.73 (0.71‐0.76) without using separate data sets.
Exploration of individualized disease associations and phenotypes
Figure 4 illustrates the exploration of metabolic phenotypes in the disease propensity space. Table 2 presents the characteristics of all subgroups (A‐F). The groups showed differently skewed body fat distributions: for Groups A and F, CHD propensities were of the same magnitude as T2D propensities. Group A (low CHD and T2D propensity) was characterized by a star shape in the BCP plot. The body fat distribution was similar to the metabolic disease‐free reference, with especially low VAT values. Group F (high CHD and T2D propensity) expressed an inflated BCP with high values of all ectopic fat variables. Groups B and C showed skewed metabolic phenotypes with CHD propensities notably higher than T2D propensities. Group B (elevated CHD and low T2D propensity) exhibited skewed body fat distributions characterized by high VAT, somewhat elevated liver PDFF (median 3.12%), and low MFI. Group C was similar, but with higher VAT, similar liver PDFF (median 3.46%), and higher MFI compared with Group B. Groups D and E also showed skewed metabolic phenotypes, but with CHD propensities notably lower than T2D propensities. Group D (low CHD and elevated T2D propensity) exhibited differently skewed body fat distributions characterized by high VAT, especially high liver PDFF, and low MFI. Group E was similar, with the most notable difference being higher MFI compared with Group D. The differences in ectopic fat variables (VATi, liver PDFF, MFI) between subgroups were statistically significant for all comparisons except for MFI between Groups C and E.
Figure 4.
Individualized BCP assessment of 10,019 participants scanned by UK Biobank. Top left: Scatterplot of propensity for CHD and T2D for all participants with six (A‐F) visually stratified subgroups highlighted in different colors. Top right: Group visualization of each corresponding group in the propensity space using the BCP plot 9. Fields are 25th to 75th percentile, and the dashed line is the median of a metabolically disease‐free population 5. Center: Six individuals, one from each subgroup in the propensity space, presented with a coronal slice from their MRI scan with VAT (pink) and aSAT (blue) segmentations, and a transversal slice with thigh muscle segmentations colored. Bottom: T2D and CHD propensity values for each participant. Error bars are estimates of uncertainty derived from bootstrapping. aSAT, abdominal subcutaneous adipose tissue; BCP, body composition profile; CHD, coronary heart disease; FR, fat ratio; MFI, muscle fat infiltration; PDFF, proton density fat fraction; T2D, type 2 diabetes; TAATi, total abdominal adipose tissue index; VATi, visceral adipose tissue index; WMR, weight‐to‐muscle ratio.
Table 2.
Summary of characteristics of six subgroups (A‐F) visually stratified in disease propensity space (Figure 4)
A | B | C | D | E | F | Other | |
---|---|---|---|---|---|---|---|
N, participants | 3,429 | 249 | 147 | 192 | 119 | 97 | 5,786 |
Gender, female/male | 3,429/0 | 249/247 | 0/147 | 192/102 | 119/70 | 0/97 | 1,632/4,164 |
Age, y | 60.92 (7.39) | 64.61 (6.72) | 68.8 (6.31) | 61.82 (7.27) | 62.04 (6.99) | 65.77 (6.50) | 63.34 (7.44) |
Weight, kg | 63.70 (9.27) | 84.81 (9.96) | 89.71 (10.79) | 84.54 (11.22) | 99.71 (16.94) | 107.20 (14.76) | 81.03 (13.09) |
BMI, kg/m2 | 24.14 (3.06) | 27.90 (2.21) | 29.64 (2.67) | 29.76 (4.22) | 34.92 (6.11) | 35.38 (4.51) | 27.63 (4.24) |
Waist circumference, cm | 76.72 (7.70) | 95.36 (6.59) | 100.74 (7.12) | 95.53 (7.55) | 107.39 (10.78) | 113.43 (10.43) | 92.07 (9.71) |
CHD prevalence, (cases/controls/unknown) | 46/2,603/780 | 19/121/109 | 17/36/56 | 11/97/84 | 9/51/59 | 21/23/53 | 342/3,238/2,206 |
CHD propensity | 2.00 (0.52) | 15.10 (2.04) | 33.08 (7.30) | 9.82 (1.32) | 16.18 (3.58) | 36.74 (5.99) | 9.53 (5.72) |
T2D prevalence, (cases/controls/unknown) | 29/3,354/46 | 13/231/5 | 16/93/0 | 16/171/5 | 32/83/4 | 31/64/2 | 306/5,405/75 |
T2D propensity | 0.95 (0.26) | 4.09 (0.58) | 11.38 (4.75) | 12.68 (1.64) | 22.23 (4.03) | 28.26 (5.32) | 5.11 (4.18) |
N, participants with obesity | 142 | 37 | 36 | 71 | 93 | 83 | 1,444 |
N, participants with fatty liver | 130 | 19 | 29 | 191 | 119 | 86 | 1,704 |
N, participants with NAFLD | 52 | 9 | 19 | 86 | 50 | 36 | 760 |
Visceral adipose tissue, L | 1.86 (0.86) | 6.32 (1.13) | 8.05 (1.58) | 5.31 (1.24) | 7.78 (1.73) | 10.00 (1.94) | 4.36 (1.91) |
Abdominal subcutaneous adipose tissue, L | 6.59 (2.38) | 6.17 (1.41) | 6.71 (1.78) | 8.73 (3.99) | 11.39 (5.2) | 10.35 (3.10) | 7.12 (3.49) |
Thigh muscle volume, L | 8.27 (1.16) | 12.33 (1.75) | 11.84 (1.53) | 11.22 (2.47) | 11.52 (2.62) | 12.72 (1.96) | 11.37 (2.45) |
Weight‐to‐muscle ratio, kg/L | 7.76 (0.92) | 6.89 (0.57) | 7.56 (0.69) | 7.77 (1.52) | 8.92 (2.10) | 8.46 (1.10) | 7.32 (1.48) |
Liver proton density fat fraction, % | 1.59 (1.19‐2.32) | 3.12 (2.28‐4.08) | 3.46 (2.52‐5.27) | 15.03 (11.80‐18.64) | 19.41 (16.77‐24.56) | 11.53 (7.73‐14.46) | 2.95 (1.79‐5.92) |
Fat ratio, % | 49.11 (9.43) | 50.11 (4.16) | 55.08 (4.33) | 55.01 (10.21) | 61.74 (8.91) | 61.13 (4.53) | 48.74 (12.42) |
Visceral adipose tissue index, L/m2 | 0.70 (0.32) | 2.05 (0.32) | 2.63 (0.49) | 1.84 (0.40) | 2.68 (0.51) | 3.26 (0.61) | 1.47 (0.62) |
Abdominal subcutaneous adipose tissue index, L/m2 | 2.47 (0.89) | 2.01 (0.43) | 2.20 (0.57) | 3.15 (1.67) | 4.06 (2.06) | 3.37 (1.00) | 2.47 (1.36) |
Total abdominal adipose tissue index, L/m2 | 3.17 (1.14) | 4.06 (0.56) | 4.83 (0.82) | 4.99 (1.79) | 6.74 (2.12) | 6.64 (1.32) | 3.94 (1.73) |
Muscle fat infiltration, % | 7.05 (1.10) | 6.79 (0.84) | 9.62 (1.80) | 7.43 (1.30) | 9.62 (2.41) | 11.15 (3.20) | 7.49 (2.07) |
Data given as mean (SD). For liver proton density fat fraction, median and interquartile range are shown.
Figure 5 presents disease propensity results for individuals with obesity (BMI > 30 kg/m2; N = 1,906), and Figure 6 presents disease propensity results for individuals with fatty liver (N = 2,278) and NAFLD (N = 1,253). Individuals with obesity and fatty liver presented with a range in metabolic phenotypes comparable to that of the full cohort and were prevalent in all subgroups (Table 2). Substratification of individuals with obesity showed similar ranges and distributions independent of BMI interval. Compared with individuals with obesity, individuals with fatty liver were less prevalent in Groups B and C and more prevalent in Groups D and E. The distributions of CHD and T2D propensity were shifted to higher values both for individuals with obesity and for those with fatty liver compared with the whole cohort. The NAFLD population presented with results similar to those with fatty liver.
Figure 5.
Left: Scatterplot of propensity for coronary heart disease (CHD) and type 2 diabetes (T2D) (calculated using the adaptive kNN [A‐kNN]) for the full data set (N = 10,019) with the population with obesity (BMI > 30 kg/m2) highlighted. Right: Distributions of CHD and T2D propensity values for different BMI intervals within the population with obesity. [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 6.
Scatterplot of propensity for coronary heart disease (CHD) and type 2 diabetes (T2D) (calculated using the adaptive kNN [A‐kNN]) for the full data set (N = 10,019) with population‐specific disease propensity distributions alongside each corresponding axis. Left: Fatty liver population highlighted. Right: Nonalcoholic fatty liver disease (NAFLD) population highlighted. [Colour figure can be viewed at wileyonlinelibrary.com]
ROC analysis showed that the diagnostic performance was maintained after stratification based on BMI. For all subgroups, the diagnostic performance was greater using disease propensity as a predictor compared with BMI or liver PDFF (for fatty liver or NAFLD). Supporting Information Table S1 presents AUROC values.
Discussion
This study applied a data‐driven method to calculate CHD and T2D propensity using individualized virtual control groups created by clustering MRI‐based body composition data. This allowed an individual‐centric assessment of each individual’s metabolic disease profile moving beyond discrete categorizations commonly made, for example using BMI. In the exploration of individualized disease associations, CHD phenotypes (higher CHD propensity than T2D propensity [Group B and Group C]), T2D phenotypes (higher T2D propensity than CHD propensity [Group D and Group E]), comorbid phenotypes (both high CHD and T2D propensity [Group F]), and more metabolically healthy phenotypes (both low CHD and T2D propensity [Group A]) were found. Among individuals with obesity, fatty liver, and NAFLD, all of these metabolic phenotypes were prevalent. This illustrates the metabolic diversity expressed in these clinically relevant populations and the possibility to effectively sub‐phenotype using body fat distribution.
Individual‐centric body composition‐based disease prediction methods
An individual‐centric analysis should preferably be based on participants similar to the individual. However, the method should not be too sensitive to local variations and, in addition, be restrictive when data are sparse, not overestimating the disease associations because of outliers.
The A‐kNN increased k until n diseased individuals were included, creating individualized virtual control groups. This ensures the inclusion of individuals who can be the basis for disease predictions, makes the algorithm adaptive to the amount of information present for different conditions, and allows for lower prediction values to be estimated at higher resolution. In addition, the method will make more individualized predictions when it is applied to larger data sets. Figure 3 shows that, when data were available, the A‐kNN varied closely together with the regression within kNN, and where data were sparse, it was more similar to the fixed kNN, not extrapolating disease associations to high (or low) prediction values. Algorithm discussion for global logistic regression, local logistic regression, classic kNN, and regression‐based kNN can be found in online Supporting Information.
Strengths and limitations
A strength of this study is that the analysis was centered around each and every individual. Clustering using continuous imaging variables enabled a straightforward calculation of distances as similarity measures between all individuals. In contrast, the inclusion of categorical variables does not allow calculation of distances and prompts similarity assumptions. In investigating disease associations with body fat distribution, gender could arguably be excluded from the clustering variables because individuals will still be compared with those with more similar body fat distribution. Applying logistic regression, for example, potential (global) differences in body composition between females and males have to be accounted for. Furthermore, in a logistic regression model, age is commonly included. As the data were cross‐sectional, age was highly correlated with disease prevalence and therefore was not suitable to include when investigating associations between body composition and metabolic diseases with clustering techniques. Longitudinal data are needed to investigate predictive power and perform risk calculations. With updated electronic health care records, the potential for individualized risk predictions can be evaluated.
Comparisons between disease propensity and commonly used risk prediction tools were not possible because key data (such as biochemistry assays) have not yet been released by the UK Biobank. Future studies should focus on investigating the potential value of adding detailed descriptions of body fat distribution to further personalize today’s risk predictions.
This study benefitted from a large well‐characterized data set including detailed, standardized descriptions of body fat distribution, measured with high accuracy and precision. However, rare body fat distributions cause sparsity in the imaging variable space in which both participants similar to the individual, as well as disease data, might be scarce. Potential sparsity in data motivates the reporting of prediction uncertainty. The coupling of disease propensity and an uncertainty measure enables individualized disease predictions that may be used in clinical situations with transparency.
CHD and T2D data were less prevalent among females, which might be why females exhibited a different pattern of disease propensities compared with males (Supporting Information Figure S5). Alternatively, females rarely express the skewed body fat distribution driving the phenotype with higher CHD propensity than T2D propensity (Figure 4, Groups B and Group C).
Implications
To effectively treat metabolic diseases, categorizations of individuals are generally made, causing the loss of the individual’s perspective. The use of body composition to create individualized virtual control groups centered the analysis around each specific individual while still allowing investigations of disease patterns and metabolic phenotypes at population scale. That participants with obesity, fatty liver, and NAFLD presented with all metabolic phenotypes (exemplified by Groups A‐F) indicates that individualization of metabolic profiles is possible also within these subpopulations. The range of propensity values within the CHD and T2D subgroups (Table 1) indicates that these disease categories also include heterogeneous populations in which some individuals exhibit disease profiles with strong associations to body fat distribution whereas others do not. Recognizing which patients exhibit CHD phenotypes, T2D phenotypes, comorbid phenotypes, or more metabolically healthy phenotypes, as well as in which patients this is manifested in differences in fat accumulation patterns, could affect treatment decisions and consequently the overall health of the patient. Detailed descriptions of body composition and fat distribution could play an important part in furthering our understanding of metabolic disorders, but networking across disciplines is needed for a complete transition to precision medicine. For example, studies have investigated the links between body composition and genetics 23, 24, and suggestions have been made on how to relate body composition to body function and metabolic processes 25.
This study showed that both obesity and NAFLD include a wide range of metabolic sub‐phenotypes. Some had strong disease associations, whereas others had seemingly healthy body compositions with little to no association with CHD and T2D. On a population level, fatty liver was associated with CHD and T2D, but liver fat alone clearly did not imply a strong disease association for the individual. However, patients suffering from NAFLD may develop a range of comorbidities, such as T2D, hepatocellular carcinoma, and CHD 26. Disease propensity analysis may aid in identification of which comorbidities NAFLD patients are more likely to develop and consequently enable optimization of their treatment. A detailed body composition assessment could also be beneficial in clinical trials to assess downstream and/or whole‐body effects associated with candidate therapies.
Anthropometric measures, such as BMI, could be effectively used in prestratification or screening to find patients who would benefit from MRI‐based body composition assessment. Within obesity, metabolic phenotyping based on body composition could (in addition to what has already been mentioned for NAFLD) provide new information useful in treatment decisions for bariatric surgery, in that some patients with unfavorable body fat distribution are not eligible today.
Conclusion
This study suggested a data‐driven method to calculate CHD and T2D propensity using individualized virtual control groups created by clustering MRI‐based body composition data. This allowed an individual‐centric assessment of each individual’s metabolic phenotype moving beyond discrete categorizations of body composition. The method allowed sub‐phenotyping in clinically relevant populations, within obesity and NAFLD, that may help in identifying which comorbidities a patient is more likely to develop and consequently enable optimization of treatment plans.
Supporting information
Funding agencies: Funding support for this work was provided by Pfizer Inc.
Disclosure: MB and ODL are stockholders in, and employees of, AMRA Medical AB. JL and BW are employees of AMRA Medical AB. JL, MB, and ODL have a patent, “Evaluating an individual’s characteristics of at least one phenotype variable,” pending.
References
- 1. Dallas Heart Study . UT Southwestern Medical Center website . https://www.utsouthwestern.edu/research/translational-medicine/doing-research/dallas-heart/. Accessed August 23, 2018.
- 2. German National Cohort (GNC) . NAKO Health Study website. https://nako.de/informationen-auf-englisch/. Accessed August 23, 2018.
- 3. Kooperative Gesundheitsforschung in der Region Augsburg (KORA) . Helmholtz Zentrum München website. https://www.helmholtz-muenchen.de/en/kora/index.html. Accessed August 23, 2018.
- 4. UK Biobank . http://www.ukbiobank.ac.uk. Accessed August 23, 2018.
- 5. Jensen MD, Ryan DH, Donato KA, et al. Guidelines (2013) for managing overweight and obesity in adults. Obesity (Silver Spring) 2014;22(S2):S1‐S410. [Google Scholar]
- 6. Neeland IJ, Turer AT, Ayers CR, et al. Body fat distribution and incident cardiovascular disease in obese adults. J Am Coll Cardiol 2015;65:2150‐2151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Lee JJ, Pedley A, Hoffmann U, Massaro JM, Fox CS. Association of changes in abdominal fat quantity and quality with incident cardiovascular disease risk factors. J Am Coll Cardiol 2016;68:1509‐1521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Therkelsen KE, Pedley A, Speliotes EK, et al. Intramuscular fat and associations with metabolic risk factors in the Framingham Heart Study. Arterioscler Thromb Vasc Biol 2013;33:863‐870. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Linge J, West J, Borga M, et al. Body composition profiling in the UK Biobank Imaging Study. Obesity (Silver Spring) 2018;26:1758‐1795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. West J, Dahlqvist Leinhard O, Romu T, et al. Feasibility of MR‐based body composition analysis in large scale population studies. PLoS One 2016;11:e0163332. doi: 10.1371/journal.pone.0163332 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Borga M, Thomas EL, Romu T, et al. Validation of a fast method for quantification of intra‐abdominal and subcutaneous adipose tissue for large scale human studies. NMR Biomed 2015;28:1747‐1753. [DOI] [PubMed] [Google Scholar]
- 12. Leinhard OD, Johansson A, Rydell J, et al. Quantitative abdominal fat estimation using MRI pattern recognition In: Proceedings of the 19th International Conference on Pattern Recognition (ICPR); December 8‐11, 2008; Tampa, FL. doi: 10.1109/ICPR.2008.4761764 [DOI] [Google Scholar]
- 13. Karlsson A, Rosander J, Romu T, et al. Automatic and quantitative assessment of regional muscle volume by multi‐atlas segmentation using whole‐body water‐fat MRI. J Magn Reson Imaging 2015;41:1558‐1569. [DOI] [PubMed] [Google Scholar]
- 14. West J, Romu T, Thorell S, et al. Precision of MRI‐based body composition measurements of postmenopausal women. PLoS One 2018;13:e0192495. doi: 10.1371/journal.pone.0192495 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Altman NS. An introduction to kernel and nearest‐neighbor nonparametric regression. Am Stat 1992;46:175‐185. [Google Scholar]
- 16. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 2nd ed Stanford, CA: Springer; 2008. [Google Scholar]
- 17. Heymsfield SB, Gallagher D, Mayer L, Beetsch J, Pietrobelli A. Scaling of human body composition to stature: new insights into body mass index. Am J Clin Nutr 2007;86:82‐91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Boca Raton, FL: Chapman and Hall/CRC; 1993. [Google Scholar]
- 19. Davison AC, Hinkley DV. Bootstrap Methods and their Applications. Cambridge, UK: Cambridge University Press; 1997. [Google Scholar]
- 20. McCullagh P, Nelder JA. Generalized Linear Models. 2nd ed London, UK: Chapman & Hall/CRC Monographs on Statistics & Applied Probability; 1989. [Google Scholar]
- 21. Loader C. Local Regression and Likelihood. New York, NY: Springer; 1999. [Google Scholar]
- 22. Sanyal AJ, Brunt ME, Kleiner DE, et al. Endpoints and clinical trial design for nonalcoholic steatohepatitis. Hepatology 2011;54:344‐353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Schleinitz D, Böttcher Y, Blüher M, Kovacs P. The genetics of fat distribution. Diabetologia 2014;7:1276‐1286. [DOI] [PubMed] [Google Scholar]
- 24. Ji Y, Yiorkas AM, Frau F, et al. Genome‐wide and abdominal MRI data provide evidence that a genetically determined favourable adiposity phenotype is characterized by lower ectopic liver fat and lower risk of type 2 diabetes, and hypertension. Diabetes 2019;68:207‐219. [DOI] [PubMed] [Google Scholar]
- 25. Müller JM, Braun W, Pourhassan M, Geisler C, Bosy‐Westphal A. Application of standards and models in body composition analysis. Proc Nutr Soc 2016;75:181‐187. [DOI] [PubMed] [Google Scholar]
- 26. Ekstedt M, Nasr P, Kechagias S. Natural history of NAFLD/NASH. Curr Hepatol Rep 2017;16:391‐397. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials