Skip to main content
Comprehensive Psychoneuroendocrinology logoLink to Comprehensive Psychoneuroendocrinology
. 2020 Dec 17;5:100025. doi: 10.1016/j.cpnec.2020.100025

Allostatic load scoring using item response theory

Shelley H Liu a,, Robert-Paul Juster b, Kristen Dams-O’Connor c, Julie Spicer d
PMCID: PMC9216382  PMID: 35754455

Abstract

Allostatic load is commonly operationalized using a sum-score of high-risk biomarkers. However, this method implies that biomarkers contribute equally to allostatic load, as each is given equal weight. Our goal in this methodological paper is to evaluate this, and complementarily, to identify biomarkers that are most informative and least informative for developing an allostatic load index. Item response theory models provide an alternate approach to calculating the allostatic load score, by treating individual biomarkers (e.g. “items”) as indicators of a latent allostatic load construct. Item response theory scores account for the data-driven discriminating power of each biomarker, and an individual’s pattern of biomarker responses. To demonstrate feasibility of this approach, we used data from the 2015–2016 National Health Examination and Nutrition Survey (NHANES; N ​= ​3751), with twelve allostatic load biomarkers representing immune response, metabolic function and cardiovascular health. Item response theory models revealed that body-mass-index and C-reactive protein were the most informative biomarkers for allostatic load. Both higher allostatic load sum-score and allostatic load item response theory score were associated with lower socio-economic status (p ​= ​0.008; p<0.001, respectively). Further, both formulations of allostatic load were positively associated with a nine-item depression screener (p<0.001 for both), but only the item response theory score was also positively associated with the impact of depressive symptoms on daily life (p ​= ​0.045). Item response theory scores may be more finely tuned to tease out effects, compared to sum-scores, and also provide more flexibility when there are missing biomarker measurements. Supplemental R code for our approach are included.

Keywords: Allostatic load, Item response theory, National health and nutrition examination survey, Biomarkers, Psychometrics, Depression

Highlights

  • Methodological paper to introduce item response theory for calculating allostatic load.

  • Biomarker data from NHANES 2015–2016 representative of United States adults.

  • Body-mass-index and C-reactive protein most informative for allostatic load.

  • Item response theory captures more variability in allostatic load compared to sum-scores.

  • Future work - item response theory can standardize allostatic load across datasets.

1. Introduction

Allostatic load is a latent construct defined as multi-systemic physiological dysregulation. Allostatic load can be indexed to quantify the ‘wear and tear’ on the body, by quantifying multiple biomarkers representing hypothalamic-pituitary-adrenal (HPA) axis, immune/inflammation, cardiovascular, lipid and glucose functioning, as well as emergent biomarkers. Central to advances in the field of psychoneuroendocrinology, allostatic load is associated with both psychosocial exposures and health outcomes including socioeconomic status [1], race/ethnicity [1], sexual orientation [2,3], workforce burnout [4,5], aging and mortality [6], perinatal outcomes [7] and psychiatric emergencies [8].

There has been much discussion regarding the measurement of allostatic load and the relative importance and weight of individual biomarkers. In particular, there is continuing debate on the best ways to operationalize and measure allostatic load, and what can be learned from the multiple biomarkers that comprise it. While allostatic load cannot be directly observed because it is a latent and unobservable trait, researchers typically estimate it by summarizing data from a number of appropriate clinical biomarkers. Allostatic load is commonly operationalized using a sum-score of high-risk biomarkers. This traditional count-based approach, such as that used in analyzing the MacArthur Studies of Successful Aging [9], relies on dichotomizing each biomarker as high/low risk using sample-dependent cutoffs or clinical cutoffs, and then summing the number of high-risk biomarkers to arrive at an allostatic load summary score. However, this approach assumes that each biomarker included in the sum score makes equal contributions to overall allostatic load, by treating all of the biomarkers as though they are interchangeable. It is also assumes that the simple sum score is capable of measuring allostatic load with equal precision across the full range of the underlying latent construct.

In addition to the count-based approach applied in the MacArthur studies and other international samples [[10], [11], [12], [13]], alternative approaches have included a two-tailed 10th/90th percentile approach [14], a standard deviation cut-off approach [15], z statistic weights generated from bootstrapping [16], factor-analytic approaches [17], and the more recent “scaling” of multi-systemic dysregulations [18]. To date, there is no consensus on the most appropriate allostatic load formulation to use [4,19].

Our goal here is to use advanced psychometric methods to evaluate alternate methods to calculating an allostatic load score and complementarily, to identify individual biomarkers that are most informative and least informative for developing an allostatic load index. Further, we determine if individual biomarkers provide information along the entire allostatic load continuum, or if they provide information for only a portion of the continuum. While there is no universal agreement on the specific biomarkers to include in an allostatic load calculation, researchers generally agree that cardiovascular, metabolic and immune biomarkers should be included. In this paper, we will demonstrate an alternate approach to calculating allostatic load scores with dichotomous biomarker data, using data from the National Health Examination and Nutrition Survey (NHANES) for illustration. Note that NHANES does not provide neuroendocrine biomarkers often used in allostatic load studies. Many studies (>20) have used NHANES to study allostatic load [20], finding associations with outcomes such as sleep disorders [21], all-cause mortality [22] and cognitive function [23], demonstrating that the biomarkers available in NHANES are a valid and accurate measure of allostatic load.

Item response theory (IRT) is a set of psychometric models used for measurement and the development of scales [24]. Most commonly used in the education testing literature, it has now been used in biomedical research in the development and scoring of patient reported outcomes [25,26]. To our knowledge, it has not yet been used in allostatic load biomarker data, but IRT shares similarities with factor-analytic approaches that have been previously used to investigate allostatic load scoring [17]. IRT is a class of psychometric models that can be used to explain the relationship between a latent, unobservable, trait (e.g., allostatic load burden), and observable characteristics or items (e.g. clinical biomarkers). Using IRT, the measurement properties of the clinical biomarkers, the set of individuals measured, and the allostatic load burden are linked together.

The allostatic load burden and the “items” (biomarkers) used to measure it are assumed to span an unobservable continuum. Thus, we can use IRT to establish each individual’s position on that continuum (e.g., quantify each individual’s allostatic load burden). Unlike the sum-score approach, biomarkers are differentially weighted in a data-driven manner, depending on how much information they provide to the overall allostatic load burden scale. Each biomarker has an estimated difficulty parameter (how likely a participant with a certain latent allostatic load level will score high-risk on that biomarker), and an estimated discrimination parameter (how informative scoring high-risk on a certain biomarker is with respect to gauging the participant’s latent allostatic load level; analogous to factor loadings). IRT-based scores differentially weight the contributions of the individual biomarkers based on their difficulty and discrimination. If we visualize the latent allostatic load burden as a ruler, with participants averaging a score at 0 and each 1-unit represents a standard deviation, we can interpret the IRT allostatic load burden score as a z-score.

Identification of the parameters needed to calculate an IRT-based allostatic load score requires access to a large and representative calibration sample. In accordance, we here use a nationally representative sample of (n ​= ​3751) US adults and use survey-weighted 25th or 75th percentile of each biomarker to define representative levels of high biomarker levels. In this paper, we dichotomize biomarkers into high/low risk and compare IRT methods versus sum-score methods for calculating allostatic load scores.

2. Methods

2.1. Analytic sample

We used data from the 2015–2016 National Health Examination and Nutrition Survey (NHANES), which is provided by the National Center for Health Statistics (NCHS) in the Centers of Disease Control and Prevention (CDC). NHANES is a recurring cross-sectional survey of the non-institutionalized civilian US population, who live in the 50 states and the District of Columbia, with details available elsewhere [27]. The study sample consisted of adults aged 20 years and older but less than 60 years. We excluded those with a positive urine pregnancy test, yielding a final sample size of n ​= ​3751.

2.2. Allostatic load

To assess allostatic load, we included biomarkers that were most commonly used in calculations of allostatic load from NHANES data [20]. Twelve commonly used biomarkers [28] were included to represent immune response (high sensitivity C-reactive protein (CRP), white blood cell count), metabolic function (glycohemoglobin, serum albumin, serum creatinine, total cholesterol, high density lipoprotein (HDL), serum triglycerides, body mass index (BMI)), and cardiovascular health (average of three resting systolic blood pressure measurements, average of three resting diastolic blood pressure measurements, pulse rate). For each biomarker, we found a high-risk cutoff, which was defined as the survey-weighted 75th percentile for all biomarkers except HDL and serum albumin, for which the high-risk cutoff was defined as the survey-weighted 25th percentile. For each of the twelve biomarkers, an individual received a score of 1 if their biomarker level was more extreme than the high-risk cutoff, and 0 otherwise. The allostatic load sum-score was calculated by taking the sum of all twelve biomarker scores. Thus, the allostatic load sum-score can range from 0 to 12.

2.3. Statistical analysis

Data from the NHANES cycle was extracted using the “nhanesA: NHANES Data Retrieval” R package [29]. We linked demographic data, laboratory data and physical exam data using a unique survey participant identifier. Our analyses accounted for the NHANES complex survey design, in order for findings to be considered nationally representative of the US population. We accounted for sampling strata, cluster and weights using the “survey: Analysis of complex survey samples” R package [30].

We reported the survey weighted frequency for the categorical variables, median and interquartile range (IQR), which is the difference between the 75th and 25th percentiles, for continuous variables. We investigated the correlation between the allostatic load sum-score and individual biomarkers using Pearson correlation. We then fitted two-parameter logistic IRT models using the R package “ltm: Latent trait models under IRT” [31], to the twelve dichotomized biomarkers in order to estimate the difficulty and discrimination parameters of the biomarkers, and to estimate an allostatic load burden score (IRT score), using expected a priori scores. We then plotted the item characteristic curves, item information curves, and the test information curve. Lastly, we plotted the estimated allostatic load burden score, against the sum-score, in order to visualize the correlation and compare and contrast those scores.

To validate our formulations of allostatic load, we assessed associations between the sum-score and IRT score formulations of allostatic load with socio-economic status (SES), as measured by family income-to-poverty ratio, since allostatic load is conceptualized as physiological weathering due stressful circumstances such as low SES. We then assessed associations of sum-score and IRT score formulations of allostatic load with depression, per the Patient Health Questionnaire (PHQ9), a nine-item depression screener that assesses the frequency of depression symptoms in the past two weeks by self-report [32,33]. Negative binomial regression models were used because there are excess zeroes in the PHQ9 scores. We adjusted for covariates of age, sex, race/ethnicity and family income-to-poverty ratio. We also assessed associations of allostatic load with impacts of depressive symptoms on daily life, as measured by the question, “How difficult have these problems [PHQ9] made it for you to do your work, take care of things at home, or get along with people?” We coded the response as binary to focus on those whose depressive symptoms had substantial impacts on their daily life (not difficult at all or somewhat difficult vs. very or extremely difficult). We used logistic regression adjusted for age, sex, race/ethnicity and family income-to-poverty ratio.

We provide a tutorial and reproducible code to implement IRT in the Supplementary Materials section VI.

3. Results

3.1. Clinical and socio-demographic characteristics

Table 1 contains the survey-weighted socio-demographic and clinical covariates. The median age of the sample was 40 [interquartile range (IQR): (29, 50)]. The sample contained equal men and women. Median family income to poverty ratio was 3.0 [IQR: (1.5, 5.0)]. The sample consisted of 59.9% Non-Hispanic Whites, 10.4% Mexican American, 7.4% Other Hispanic, 12.3% Non-Hispanic Black, 6.3% Non-Hispanic Asian and 3.7% Other Race/Multi-racial.

Table 1.

Survey-weighted sociodemographic and clinical covariates. Interquartile range denotes the interval covering the 25th to 75th percentile. Weighted frequencies and summary statistics were calculated using the R “survey” package which accounted for NHANES complex survey design.

Covariate Summary measure
Weighted frequency (%)
Sex
 Male 49.9
 Female 50.1
Race/Ethnicity
 Mexican American 10.4
 Other Hispanic 7.4
 Non-Hispanic White 59.9
 Non-Hispanic Black 12.3
 Non-Hispanic Asian 6.3
 Other Race/Multi-Racial 3.7



Median (interquartile range)
Age (years) 40 (29, 50)
Family income-to-poverty ratio 3.0 (1.5–5.0)
White blood cell count (1000 ​cells/uL) 7.1 (5.9–8.7)
C-reactive protein (mg/L) 1.7 (0.6–4.3)
Body mass index (kg/m2) 28.2 (24.2–33.0)
Serum triglycerides (mg/dL) 118 (77–187)
Serum albumin (g/dL) 4.4 (4.2–4.6)
Serum creatinine (mg/dL) 0.82 (0.69–0.95)
Systolic blood pressure (mmHg) 118 (110–127)
Diastolic blood pressure (mmHg) 72 (65–78)
Pulse rate (beats per min) 72 (66–80)
High density lipoprotein (mg/dL) 51 (41–64)
Total cholesterol (mg/dL) 188 (164–216)
Glycohemoglobin (%) 5.4 (5.1–5.7)
Urinary creatinine (mg/dL) 113 (65–184)
Allostatic load sum-score 3 (1, 4)

3.2. Allostatic load

Because we used the survey-weighted 75th percentile (or 25th percentile) cutoffs to define high-risk for each biomarker, the proportion of the sample belonging to the high-risk group was not always 25%, as would be expected if we did not use the survey-weighted cutoffs. The allostatic load sum-score ranged from 0 to 12, with median of 3 [IQR: (1, 4)]. Supplementary Fig. 1 depicts the distribution of the allostatic load sum-score in the sample. The sum-score was most strongly correlated with CRP (r ​= ​0.523), BMI (r ​= ​0.521) and glycohemoglobin (r ​= ​0.506) and was least correlated with creatinine (r ​= ​0.221). There was a moderate correlation between a few additional biomarkers (see Supplementary Fig. 2, which provides a correlation plot of the twelve biomarkers plus the allostatic load sum-score). Specifically, SBP and DBP had a correlation of 0.432, BMI and CRP had a correlation of 0.403, and triglycerides and HDL had a correlation of 0.387.

3.3. Item response theory

We fit a two-parameter logistic model to the twelve dichotomized biomarkers, treating each biomarker as an “item”. We evaluated item fit statistics and found that all items fit the 2 ​PL model. The item characteristic curves (ICCs) are presented in Fig. 1. The steeper the ICC, the more the biomarker is strongly related to allostatic load. The discrimination of the biomarkers varied – CRP and BMI provided the most discrimination; while serum creatinine provided the least discrimination. The ICC for serum creatinine is mostly flat, meaning that it does not provide much information at any range of allostatic load burden, and thus is not an informative biomarker for the IRT score for this sample. Fig. 2 contains the item information curves, which similarly demonstrates that BMI and CRP provide the most information about the allostatic load burden, and is most informative for slightly higher than average allostatic load burden levels. Supplementary Fig. 3 presents the test information curve, which shows that the test provides the most information for allostatic load burden that is 1 standard deviation above the average burden for the population, but provides less information about very low or very high allostatic load burden levels. The distribution of the IRT score is presented in Supplementary Fig. 4.

Fig. 1.

Fig. 1

Item characteristic curves for twelve allostatic load biomarkers in the NHANES 2015–2016 study, using a 2 parameter logistic model.

Fig. 2.

Fig. 2

Item information curves for twelve allostatic load biomarkers in the NHANES 2015–2016 study, using a 2 parameter logistic model.

3.4. Relationship between allostatic load IRT (burden) score and allostatic load sum-score

Fig. 3 shows the plot between the sum-score and IRT score formulations of allostatic load. Although there is a general monotonic relationship (positive correlation) between the sum-score and the IRT score, we found that a high sum-score did not always imply a high IRT score, since the IRT score also depended on the biomarker characteristics (e.g., discrimination) and the response pattern (participants may be high-risk on different sets of biomarkers).

Fig. 3.

Fig. 3

Plot of the allostatic load sum-scores versus the estimated allostatic load IRT scores, for the NHANES 2015–2016 study.

3.5. Association between socio-economic status and allostatic load sum-score and IRT score

To verify our formulation of allostatic load reflects the conceptualization of allostatic load as representative of physiological weathering due to stressful circumstances such as low SES, we assessed associations of sum-score and IRT score with SES (family income-to-poverty ratio) (Supplementary Table 1). SES was significantly negatively associated with both the sum-score and the IRT score (p ​= ​0.008; p<0.001, respectively).

3.6. Associations between allostatic load sum-score and IRT score with depression

We then assessed adjusted associations of the sum-score and IRT score with a depression screener, the PHQ9 (Table 2). Both the sum-score and IRT score were significantly positively associated with PHQ9 (p<0.001 for both).

Table 2.

Adjusted associations of allostatic load IRT and sum-scores with depression screener (Patient Health Questionnaire, PHQ9) in the NHANES 2015–2016 study. Negative binomial regression was used to account for excess zeros in PHQ9 scores. Models were adjusted for sex, age, race/ethnicity and SES (family income-to-poverty ratio). The models have different sample sizes, as IRT does not require complete data on all allostatic load biomarkers in order to calculate scores, unlike the sum-score approach.

Predictors Incidence Rate Ratios 95% CI p Incidence Rate Ratios 95% CI p
(Intercept) 4.41 3.60 – 5.41 <0.001 3.69 3.00 – 4.55 <0.001
AL IRT Score 1.19 1.12 – 1.26 <0.001
AL Sum Score 1.06 1.03 – 1.08 <0.001
Sex
 Male Reference
 Female 1.27 1.15 – 1.39 <0.001 1.3 1.18 – 1.44 <0.001
Age (years) 1 1.00 – 1.01 0.493 1 1.00 – 1.01 0.618
Race/Ethnicity
 Non-Hispanic White Reference
 Mexican American 0.69 0.60 – 0.79 <0.001 0.73 0.63 – 0.84 <0.001
 Other Hispanic 0.94 0.80 – 1.09 0.409 0.99 0.84 – 1.16 0.88
 Non-Hispanic Black 0.8 0.70 – 0.90 <0.001 0.82 0.72 – 0.95 0.006
 Non-Hispanic Asian 0.74 0.63 – 0.86 <0.001 0.78 0.66 – 0.93 0.004
Other Race, including Multi-Racial 1.08 0.86 – 1.37 0.521 1.12 0.89 – 1.44 0.348
Family income to poverty ratio 0.86 0.84 – 0.89 <0.001 0.86 0.83 – 0.89 <0.001
Observations 2913 2581

Lastly, we assessed adjusted associations of sum-score and IRT score with impacts of depressive symptoms on daily life (Table 3). Notably, only the IRT score was significantly positively associated with this outcome (p ​= ​0.045), while the sum-score was not associated (p ​= ​0.20).

Table 3.

Adjusted associations of allostatic load IRT and sum-scores with impact of depressive symptoms on daily life in the NHANES 2015–2016 study. Impact of depressive symptoms on daily life was measured by the question, “How difficult have these problems [PHQ9] made it for you to do your work, take care of things at home, or get along with people?” The response was coded as binary (not difficult at all or somewhat difficult vs. very or extremely difficult). We used logistic regression adjusted for age, sex, race/ethnicity and family income-to-poverty ratio. The models have different sample sizes, as IRT does not require complete data on all allostatic load biomarkers in order to calculate scores, unlike the sum-score approach.

Predictors Odds Ratios 95% CI p Odds Ratios 95% CI p
Intercept 0.08 0.03–0.21 <0.001 0.05 0.02–0.15 <0.001
AL IRT score 1.33 1.01–1.75 0.045
AL Sum score 1.07 0.96–1.19 0.204
Sex
 Male Reference
 Female 1.28 0.83–1.99 0.267 1.16 0.72–1.86 0.544
Age (years) 1.02 1.00–1.04 0.041 1.02 1.00–1.04 0.047
Race/Ethnicity
 Non-Hispanic White Reference
 Mexican American 0.16 0.06–0.37 <0.001 0.18 0.06–0.43 <0.001
 Other Hispanic 0.68 0.35–1.26 0.24 0.84 0.41–1.60 0.606
 Non-Hispanic Black 0.46 0.25–0.81 0.008 0.56 0.29–1.04 0.075
 Non-Hispanic Asian 0.51 0.20–1.11 0.114 0.56 0.21–1.28 0.202
Other Race, including Multi-Racial 0.68 0.23–1.66 0.438 0.72 0.21–1.90 0.546
Family income-to-poverty ratio
0.58
0.48–0.68
<0.001
0.6
0.50–0.72
<0.001
Observations 2066 1849

4. Discussion

Our study is the first to demonstrate the use of item response theory to understand how different biomarkers provide information about latent allostatic load burden. We demonstrate an alternative method to calculate allostatic load, beyond the traditional sum-score approach, which can account for discriminating abilities of individual biomarkers. Our findings suggest that BMI and CRP (or related immune measures) should be included in calculations of allostatic load. Additionally, CRP and BMI are often positively correlated [[34], [35], [36]], as was found in the current study. Further, our findings provide evidence that an alternative way of calculating allostatic may provide more variability in the allostatic load scores than the traditional sum-score approach. This can help tease out additional effects not seen using sum-scores if allostatic load is used as a predictor, mediator or outcome.

In order to verify these formulations of allostatic load, we first compared associations with SES, and found that both higher sum-score and IRT score formulations of allostatic load were significantly related to lower SES. Further, in line with existing research [37,38], both formulations of allostatic load were significantly positively associated with depressive symptoms, but only the IRT score was also significantly associated with the impact of depressive symptoms on daily life. This suggests that the IRT score may be more finely tuned to tease out effects. This may also be due to the fact that by using the IRT score, we are able to include more participants in the analysis. Unlike the sum-score, the IRT score calculation does not require that every participant have every biomarker measured. Thus, we are better able to make use of missing data common to the field of psychoneuroendocrinology.

In this analysis, we build upon previous work using factor analytic approaches to explore the factor structure of allostatic load [17]. Previous factor analytic approaches have largely focused on determining the number of factors that comprise allostatic load, confirming uni-dimensionality or multi-dimensionality, and to a lesser extent focus on scoring allostatic load. In this paper, we use an IRT model to score allostatic load and determine which biomarkers provide more information. We treat each biomarker as dichotomous (high vs. low risk) in order to make comparisons with the traditional count-based calculation of allostatic load which involves a sum-score of high-risk biomarkers. Confirmatory factor analysis using dichotomous variables is asymptotically equivalent to the two-parameter logistic model, and both models can be considered as item factor analysis (Wirth and Edwards, 2007, Psychological Methods). However, the focus of the two methods are different – IRT focuses on scoring, which is of importance here and the broader discussion regarding allostatic load measurement.

It should be recognized that other investigations of allostatic load calculations have been conducted and not all allostatic load studies treat each biomarker independently. For example, across 23 biomarkers from the Midlife in the United States (MIDUS), a sum can be calculated such that each physiological system is represented with equal weight, though the number of biomarkers per system may not be equal [1]. Further, recent work on factor analysis has demonstrated that allostatic load biomarkers load onto a general allostatic load component, and within their respective physiological systems, indicating that there is common and unique variance among systems [17]. It is important to note, however, that the current analysis is limited to the number of biomarkers provided by NHANES sampling. As such, we encourage others to replicate our approach using other databases (e.g., MIDUS) with additional biomarkers.

Due to mathematical properties of the sum-score, the greater the number of biomarkers measured, the more stable the measure. However, measuring a large set of biomarkers is often infeasible, due to increased costs and participant burden. A key advantage of IRT methods is that we only need to calibrate the allostatic load scale once, and these “items” (biomarkers) can be used in future analyses to calculate an underlying allostatic load measure for a participant using the item parameter estimates, even if only a subset of biomarkers are measured, which can be used to advance reproducible research on allostatic load. However, the accuracy of a score calculated with just a small number of biomarkers may be poor. Further, this does not account for lab or instrument differences for biomarker measures. In addition, important sex, age, and race/ethnic variations in biomarkers may need to be considered as we move towards applying population norms when calculating allostatic load.

In future work, IRT may also provide a way to harmonize allostatic load scores across cohorts. Cohorts may measure slightly different sets of allostatic load biomarkers, with a common set of overlapping ones. Using IRT, we can standardize allostatic load scales across studies, using the overlapping biomarkers as anchors, so that the allostatic load score can be compared, even if the studies did not measure exactly the same set of biomarkers. IRT has been used in data harmonization, such as harmonizing measures of cognitive aging across international surveys [39], and harmonizing measures of general health functioning [40].

In this paper, we follow theoretical and methodological work that represents allostatic load as an uni-dimensional construct. Commonly used methods of calculating allostatic load scores, such the sum-score, or average biomarker z-scores, implicitly assumes that allostatic load is uni-dimensional [6,20,41]. In the interests of refining alternative measurement, our goal here was to evaluate this common sum-score approach. However, further work is needed to assess whether a multi-dimensional item response theory model better fits the data. There is potential to fit a three-dimensional IRT model (for immune, metabolic and cardiovascular physiological systems). Recent work [42] suggests that for dichotomous items, the estimated theta (allostatic load burden) scores are unbiased by violations of uni-dimensionality, that the item parameters are robust against uni-dimensionality, suggesting that practically our approach is valid even when uni-dimensionality assumptions are violated.

As stated above, we were limited by the NHANES biomarkers as we did not have the recommended three indicators for each factor (immune only has two biomarkers). Also, we did not represent neuroendocrine parameters like the HPA-axis that are central to allostatic load theory [43]. Interpretation of a multi-dimensional IRT (e.g., neuroendocrine, immune, metabolic, cardiovascular) is more complex analytically. However, multi-dimensional IRT may be helpful when we have a larger set of allostatic load biomarkers, with imbalance in the number of biomarkers per physiological system. Using multi-dimensional IRT would allow us to calculate a subscore for each physiological system, and an overall allostatic load score.

In future work, we aim to expand our IRT approach to categorical (ordinal) data, using models such as graded response models. This allows for quantiles of each allostatic load biomarker rather than a binary measure, which will address the fact that there may not be a single threshold, but instead a gradation of risk, and this could be more reflective of subclinical risk.

4.1. Limitations

Our study had limitations. NHANES is a cross-sectional study, meaning we were unable to assess temporality of the allostatic load time course. We did not use more than one cycle of NHANES because one allostatic load biomarker, high sensitivity CRP, was only measured in this cycle, and we felt it was important to include it because we only had two immune biomarkers. NHANES is also limited in the number of allostatic load biomarkers collected. While we included the ones that are most commonly used in NHANES analyses of allostatic load [3,44], it is possible that other cohort studies may have additional measures not studied here, and the inclusion of those biomarkers in an IRT model may cause the discrimination power of the biomarkers to change. However, the use of NHANES data is a strength because we were able to calculate survey-weighted cutoffs for high-risk for each biomarker. This allows us to use cutoffs that are generalizable to the United States population, unlike other cohort studies of allostatic load in which sample-based cutoffs are used and thus findings are dependent on the sample characteristics. In future work, using clinical cutoffs as previously proposed [4,45] can also be explored; however, this can be limited by differences in assay, machines and laboratory standards.

In future work, we will also evaluate whether it is conceptually valid to use the same scoring for all participants, or if different biomarker cutoffs should be used for different groups, to reflect physiological differences due to age, sex, race/ethnicity and comorbidity status. This may also be addressed by evaluating differential item functioning (DIF) of each biomarker [46,47]. If a biomarker exhibits DIF, this implies that given the same level of allostatic load burden, one group is more likely to score high-risk for that biomarker than another group, which suggests that different cutoffs may need to be used for different groups. This will help us define how to make an allostatic index that incorporates a balance of informative biomarkers across all race/ethnicity groups. More work is needed to study alternative methods of calculating allostatic load scores which can account for discriminating abilities of individual biomarkers, explore if different cutoffs are needed for different demographic groups, as well as assessing how these indicators function for different race/ethnicity groups.

We did not address medications in the calculation of allostatic load. While prescription medication information is available in NHANES via self-report [48] (2020), we do not know whether all participants elected to disclose their medications list or whether there is nonresponse bias. Further, many medications, including anti-hypertensives and statins, can affect biomarker levels; thus, it is difficult to ascertain which medications we should adjust for. As the focus of this paper is to demonstrate feasibility of the IRT method, we did not adjust for medications here or use it as exclusion criteria. Future work is needed to assess this IRT method in cohorts with a broader range of biomarkers and physician verified medications list.

4.2. Conclusion

In this methodological paper, we have demonstrated that IRT is able to provide additional insight into the allostatic load construct beyond that provided by the standard sum-score metric. An IRT-based approach is able to capture more variability in the allostatic load construct, as the IRT score can account for the patterns of item responses and the discriminating power of each biomarker. Because an IRT approach to calculating allostatic load appears to provide more variability in the allostatic load index than the traditional sum-score approach, this approach may be especially helpful in study designs that seek to delineate additional effects not seen using sum-scores if allostatic load is used as a predictor, mediator or outcome. Lastly, we have included a tutorial and R code in the Supplementary Materials so that researchers can calculate IRT scores in their own datasets.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Drs. Liu and Spicer are supported by the National Institute of Environmental Health Sciences (P30ES023515). Dr. Spicer is also supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (R00 HD07966802). Dr. Juster is supported by Fonds de recherche Québec – Santé and holds a Canadian Institutes of Health Research Sex and Gender Science Chair.

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.cpnec.2020.100025.

Appendix A. Supplementary data

The following is the Supplementary data to this article:

Multimedia component 1
mmc1.pdf (759.5KB, pdf)

References

  • 1.Gruenewald T.L., Karlamangla A.S., Hu P., Stein-Merkin S., Crandall C., Koretz B., Seeman T.E. History of socioeconomic disadvantage and allostatic load in later life. Soc. Sci. Med. 2012;74:75–83. doi: 10.1016/j.socscimed.2011.09.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Juster R.P., Smith N.G., Ouellet E., Sindi S., Lupien S.J. Sexual orientation and disclosure in relation to psychiatric symptoms, diurnal cortisol, and allostatic load. Psychosom. Med. 2013;75:103–116. doi: 10.1097/PSY.0b013e3182826881. [DOI] [PubMed] [Google Scholar]
  • 3.Mays V.M., Juster R.P., Williamson T.J., Seeman T.E., Cochran S.D. Chronic physiologic effects of stress among lesbian, gay, and bisexual adults: results from the National Health and Nutrition Examination Survey. Psychosom. Med. 2018;80:551–563. doi: 10.1097/PSY.0000000000000600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Juster R.P., Sindi S., Marin M.F., Perna A., Hashemi A., Pruessner J.C., Lupien S.J. A clinical allostatic load index is associated with burnout symptoms and hypocortisolemic profiles in healthy workers. Psychoneuroendocrino. 2011;36:797–805. doi: 10.1016/j.psyneuen.2010.11.001. [DOI] [PubMed] [Google Scholar]
  • 5.Schnorpfeil P., Noll A., Schulze R., Ehlert U., Frey K., Fischer J.E. Allostatic load and work conditions. Soc. Sci. Med. 2003;57:647–656. doi: 10.1016/s0277-9536(02)00407-0. [DOI] [PubMed] [Google Scholar]
  • 6.Seeman T.E., McEwen B.S., Rowe J.W., Singer B.H. Allostatic load as a marker of cumulative biological risk: MacArthur studies of successful aging. P Natl Acad Sci USA. 2001;98:4770–4775. doi: 10.1073/pnas.081072698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wallace M.E., Harville E.W. Allostatic load and birth outcomes among white and Black women in new orleans. Matern. Child Health J. 2013;17:1025–1029. doi: 10.1007/s10995-012-1083-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Juster R.P., Sasseville M., Giguere C.E., Lupien S.J., Consortium S. Elevated allostatic load in individuals presenting at psychiatric emergency services. J. Psychosom. Res. 2018;115:101–109. doi: 10.1016/j.jpsychores.2018.10.012. [DOI] [PubMed] [Google Scholar]
  • 9.Seeman E., Singer B.H., Rowe J., Horwitz R.I., McEwen B. Price of adaptation - allostatic load and its health consequences. Arch. Intern. Med. 1997;157:2259–2268. [PubMed] [Google Scholar]
  • 10.Lindfors P., Lundberg O., Lundberg U. Allostatic load and clinical risk as related to sense of coherence in middle-aged women. Psychosom. Med. 2006;68:801–807. doi: 10.1097/01.psy.0000232267.56605.22. [DOI] [PubMed] [Google Scholar]
  • 11.Maloney E.M., Boneva R., Nater U.M., Reeves W.C. Chronic fatigue syndrome and high allostatic load: results from a population-based case-control study in Georgia. Psychosom. Med. 2009;71:549–556. doi: 10.1097/PSY.0b013e3181a4fea8. [DOI] [PubMed] [Google Scholar]
  • 12.Maselko J., Kubzansky L., Kawachi I., Seeman T., Berkman L. Religious service attendance and allostatic load among high-functioning elderly. Psychosom. Med. 2007;69:464–472. doi: 10.1097/PSY.0b013e31806c7c57. [DOI] [PubMed] [Google Scholar]
  • 13.Seeman T.E., Singer B.H., Ryff C.D., Dienberg Love G., Levy-Storms L. Social relationships, gender, and allostatic load across two age cohorts. Psychosom. Med. 2002;64:395–406. doi: 10.1097/00006842-200205000-00004. [DOI] [PubMed] [Google Scholar]
  • 14.Glei D.A., Goldman N., Chuang Y.L., Weinstein M. Do chronic stressors lead to physiological dysregulation? Testing the theory of allostatic load. Psychosom. Med. 2007;69:769–776. doi: 10.1097/PSY.0b013e318157cba6. [DOI] [PubMed] [Google Scholar]
  • 15.Goodman E., McEwen B.S., Huang B., Dolan L.M., Adler N.E. Social inequalities in biomarkers of cardiovascular risk in adolescence. Psychosom. Med. 2005;67:9–15. doi: 10.1097/01.psy.0000149254.36133.1a. [DOI] [PubMed] [Google Scholar]
  • 16.Karlamangla A.S., Singer B.H., Seeman T.E. Reduction in allostatic load in older adults is associated with lower all-cause mortality risk: MacArthur studies of successful aging. Psychosom. Med. 2006;68:500–507. doi: 10.1097/01.psy.0000221270.93985.82. [DOI] [PubMed] [Google Scholar]
  • 17.Wiley J.F., Gruenewald T.L., Karlamangla A.S., Seeman T.E. Modeling multisystem physiological dysregulation. Psychosom. Med. 2016 Apr;78(3):290–301. doi: 10.1097/PSY.0000000000000288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chen E., Miller G.E., Lachman M.E., Gruenewald T.L., Seeman T.E. Protective factors for adults from low-childhood socioeconomic circumstances: the benefits of shift-and-persist for allostatic load. Psychosom. Med. 2012;74:178–186. doi: 10.1097/PSY.0b013e31824206fd. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Seplaki C.L., Goldman N., Glei D., Weinstein M. A comparative analysis of measurement approaches for physiological dysregulation in an older population. Exp. Gerontol. 2005;40:438–449. doi: 10.1016/j.exger.2005.03.002. [DOI] [PubMed] [Google Scholar]
  • 20.Duong M.T., Bingham B.A., Aldana P.C., Chung S.T., Sumner A.E. Variation in the calculation of allostatic load score: 21 examples from NHANES. J Racial Ethn Health Disparities. 2017;4:455–461. doi: 10.1007/s40615-016-0246-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chen X., Redline S., Shields A.E., Williams D.R., Williams M.A. Associations of allostatic load with sleep apnea, insomnia, short sleep duration, and other sleep disturbances: findings from the National Health and Nutrition Examination Survey 2005 to 2008. Ann. Epidemiol. 2014;24:612–619. doi: 10.1016/j.annepidem.2014.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Borrell L.N., Dallo F.J., Nguyen N. Racial/ethnic disparities in all-cause mortality in U.S. adults: the effect of allostatic load. Publ. Health Rep. 2010;125:810–816. doi: 10.1177/003335491012500608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kobrosly R.W., Seplaki C.L., Jones C.M., van Wijngaarden E. Physiologic dysfunction scores and cognitive function test performance in U.S. adults. Psychosom. Med. 2012;74:81–88. doi: 10.1097/PSY.0b013e3182385b1e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Szanton S.L., Allen J.K., Seplaki C.L., Bandeen-Roche K., Fried L.P. Allostatic load and frailty in the women’s health and aging studies. Biol. Res. Nurs. 2009;10:248–256. doi: 10.1177/1099800408323452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Fries J.F., Bruce B., Cella D. The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes. Clin. Exp. Rheumatol. 2005;23:S53–S57. [PubMed] [Google Scholar]
  • 26.Nguyen T.H., Han H.R., Kim M.T., Chan K.S. An introduction to item response theory for patient-reported outcome measurement. Patient. 2014;7:23–35. doi: 10.1007/s40271-013-0041-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zipf G., Chiappa M., Porter K.S., Ostchega Y., Lewis B.G., Dostal J. National health and nutrition examination survey: plan and operations, 1999-2010. Vital Health Stat. 2013;56:1–37. [PubMed] [Google Scholar]
  • 28.Juster R.P., McEwen B., Lupien S.J. Allostatic load biomarkers of chronic stress and impact on health and cognition. Neurosci. Biobehav. Rev. 2010;35:2–16. doi: 10.1016/j.neubiorev.2009.10.002. [DOI] [PubMed] [Google Scholar]
  • 29.Endres C.J. CRAN R-project; 2018. (nhanesA: NHANES Data Retrieval, 0.6.5). [Google Scholar]
  • 30.Lumley T. CRAN R-project; 2019. (Survey: Analysis of Complex Survey Samples, 3.36). [Google Scholar]
  • 31.Rizopoulos D. CRAN R-Project; 2018. Ltm: Latent Trait Models under IRT. [Google Scholar]
  • 32.Kroenke K., Spitzer R.L. The PHQ-9: a new depression and diagnostic severity measure. Psychiatr. Ann. 2002;32:509–521. [Google Scholar]
  • 33.Kroenke K., Spitzer R.L., William J.B. The PHQ-9: validity of a brief depression severity measure. J. Gen. Intern. Med. 2001;16:1606–1613. doi: 10.1046/j.1525-1497.2001.016009606.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Khaodhiar L., Ling P.R., Blackburn G.L., Bistrian B.R. Serum levels of interleukin-6 and C-reactive protein correlate with body mass index across the broad range of obesity. Jpen-Parenter Enter. 2004;28:410–415. doi: 10.1177/0148607104028006410. [DOI] [PubMed] [Google Scholar]
  • 35.Pannacciulli N., Cantatore F.P., Minenna A., Bellacicco M., Giorgino R., De Pergola G. C-reactive protein is independently associated with total body fat, central fat, and insulin resistance in adult women. Int. J. Obes. 2001;25:1416–1420. doi: 10.1038/sj.ijo.0801719. [DOI] [PubMed] [Google Scholar]
  • 36.Rawson E.S., Freedson P.S., Osganian S.K., Matthews C.E., Reed G., Ockene I.S. Body mass index, but not physical activity, is associated with C-reactive protein. Med. Sci. Sports Exerc. 2003;35:1160–1166. doi: 10.1249/01.MSS.0000074565.79230.AB. [DOI] [PubMed] [Google Scholar]
  • 37.Kobrosly R.W., van Wijngaarden E., Seplaki C.L., Cory-Slechta D.A., Moynihan J. Depressive symptoms are associated with allostatic load among community-dwelling older adults. Physiol. Behav. 2014;123:223–230. doi: 10.1016/j.physbeh.2013.10.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.McEwen B. Mood disorders and allostatic load. Biol. Psychiatr. 2003;54:200–207. doi: 10.1016/s0006-3223(03)00177-x. [DOI] [PubMed] [Google Scholar]
  • 39.Chan K.S., Gross A.L., Pezzin L.E., Brandt J., Kasper J.D. Harmonizing measures of cognitive performance across international surveys of aging using item response theory. J. Aging Health. 2015;27:1392–1414. doi: 10.1177/0898264315583054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Gibbons R.D., Perraillon M.C., Kim J.B. Item response theory approaches to harmonization and research synthesis. Health Serv. Outcome Res. Methodol. 2014;14:213–231. doi: 10.1007/s10742-014-0125-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.van der Linden W.J., Hambleton R.K. Springer; 1996. Handbook of Modern Item Response Theory. [Google Scholar]
  • 42.Crisan D.R., Tendeiro J.N., Meijer R.R. Investigating the practical consequences of model misfit in unidimensional IRT models. Appl. Psychol. Meas. 2017;41:439–455. doi: 10.1177/0146621617695522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.McEwen B. Sex, stress and the hippocampus: allostasis, allostatic load and the aging process. Neurobiol. Aging. 2002;23:921–939. doi: 10.1016/s0197-4580(02)00027-1. [DOI] [PubMed] [Google Scholar]
  • 44.Crimmins E.M., Johnston M., Hayward M., Seeman T. Age differences in allostatic load: an index of physiological dysregulation. Exp. Gerontol. 2003;38:731–734. doi: 10.1016/s0531-5565(03)00099-8. [DOI] [PubMed] [Google Scholar]
  • 45.Seplaki C.L., Goldman N., Glei D., Weinstein M. A comparative analysis of measurement approaches for physiological dysregulation in an older population. Exp. Gerontol. 2005;40:438–449. doi: 10.1016/j.exger.2005.03.002. [DOI] [PubMed] [Google Scholar]
  • 46.Edelen M.O., Thissen D., Teresi J.A., Kleinman M., Ocepek-Welikson K. Identification of differential item functioning using item reponses theory and the likelihood-based model comparison approach: applicaton to the Mini-Mental State Examination. Med. Care. 2006;44 doi: 10.1097/01.mlr.0000245251.83359.8c. S134-142. [DOI] [PubMed] [Google Scholar]
  • 47.Zieky M. In: Differential Item Functioning. Holland P.W., Wainer H., editors. Elibaum; Hillsdale, NJ: 1993. History and development of DIF. [Google Scholar]
  • 48.Prevention C.f.D.C.a. 2020. National Health and Nutrition Examination Survey, Dietary Supplement and Prescription Medication Section. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.pdf (759.5KB, pdf)

Articles from Comprehensive Psychoneuroendocrinology are provided here courtesy of Elsevier

RESOURCES