Abstract
The concentration index, including its normalization, is prominently used to assess socioeconomic inequalities in health and health care. Wagstaff's and Erreygers' normalizations or corrections of the standard concentration index are the most suggested approaches when analyzing binary health variables encountered in many health economics and health services research. In empirical applications of the corrected or normalized concentration indices, researchers interpret them similarly to the standard concentration index, which may be problematic as this ignores their underlying behaviors. This paper shows that the empirical bounds of the standard concentration index, including the corrected indices, depend not only on the sample size directly but also on the sampling weight. Notably, the paper highlights critical challenges for assessing and interpreting the popular Wagstaff's and Erreygers' corrected concentration indices with binary health variables. Specifically, it shows that it might be misleading, for example, to assess socioeconomic health inequalities using the magnitude of the “symmetric” Erreygers' corrected concentration index in the face of progressive improvements in the binary health variable. Also, Wagstaff's normalized concentration index may give a spurious “concentration” of the binary health variable among the rich or the poor in certain rare instances.
Keywords: binary health variable, concentration index, Erreygers' normalization, socioeconomic health inequality, Wagstaff's normalization
1. INTRODUCTION
The concentration index is commonly used to assess socioeconomic inequality in health and health care (Ataguba et al., 2011; Kakwani et al., 1997; van Doorslaer & Koolman, 2004; Wagstaff, 2005; Wagstaff et al., 1991). The theoretical values of the standard concentration index range between −1 (a case where the health variable (e.g., obesity) is concentrated on the most disadvantaged individual) and +1 (a case where it is concentrated on the most advantaged individual). The standard concentration index is positive when ill‐health (or health) is more prevalent among wealthier groups and negative if otherwise. Its magnitude conveys the relative degree of concentration among poorer or richer groups. For binary variables that are common in health economics research, Wagstaff's (2005) seminal paper showed that the range of values for the standard concentration index depends on the mean (i.e., the proportion) of the variable () and with a large sample, will be between and instead of −1 and +1 for the lower and upper bounds, respectively. In general, Erreygers (2009, p.506) shows that the bounds are not unique to binary health variables as researchers can construct different upper and lower bounds for any “health variable with a finite upper value or a positive lower value.” A debate on how best to “adjust” the standard concentration index continues, mainly when, for example, a binary health variable is used (Erreygers, 2009; Wagstaff, 2005) or when finitely bounded health variables are used (Erreygers, 2009; Erreygers & Van Ourti, 2011).
This paper focuses on binary health variables often encountered in health economics research. It lays out crucial issues that researchers using the standard concentration index, Wagstaff's and Erreygers' normalized concentration indices may not fully internalize, especially when interpreting or comparing results. It begins by showing that the standard concentration index's empirical bounds are determined by the sample size (something that Erreygers (2009) highlights) and by the sampling structure, especially the weight variable. It also highlights some implications of Wagstaff's and Erreygers' normalization for interpretation and policy.
2. EXAMINING THE LOWER AND UPPER BOUNDS FOR THE WAGSTAFF'S AND ERREYGERS' CORRECTED CONCENTRATION INDICES
Although this is trivial in many cases, especially when the sample size is significantly large, the empirical bounds of the standard concentration index are not the same as the theoretical bounds. It is known that ignoring the sampling structure, especially the sample weights, the standard concentration index's empirical bounds are − and , where is the sample size (Erreygers, 2009). Also, ties in the measure of socioeconomic status could affect the estimate of the concentration index (Chen & Roy, 2009). This piece notes that the empirical bounds of the concentration index are a function of both the fractional rank (often used in empirical estimation; Kakwani et al., 1997) and the sample size.
Let us write the standard concentration index () as (Kakwani et al., 1997):
| (1) |
where is the sample size, is the value of individual i's health variable (e.g., an indicator of ill‐health) with as its mean and is individual i's fractional rank in the distribution of standard of living or socioeconomic status. Assume that is a non‐negative variable and with individuals arranged from poorest () to richest () by the values of the socioeconomic variable, y1,y2,…,yl,…,yn . Theoretically, the standard concentration index takes on its highest value (i.e., +1) when the wealthiest individual (with ) reports a non‐negative and non‐zero value of (i.e., ) and the remaining individuals report zero values for (where, ). Similarly, the standard concentration index takes on its lowest value (i.e., −1) when the poorest individual (with ) has a positive non‐zero value for (i.e., ) and .
Assuming that the poorest individuals have positive non‐zero values for but the wealthiest individuals have zero values for , then the mean of the health variable, in this case, can be written as .
The fractional rank () in Equation (1) can be written as (O’Donnell et al., 2008):
| (2) |
where is the relative weight of individual j and .
Using the fractional rank, Equation (1) can be expanded as:
| (3) |
which simplifies to
| (4) |
where with .
So, for any health variable, including binary variables, using Equation (4), the empirical bounds of the standard concentration index become:
| (5) |
where and represent the relative weights for individuals with and , respectively. Note that and as . Therefore, with large samples, the lower and upper bounds of the standard concentration index will approach −1 and +1, respectively.
Using the standard concentration index in Equation (4), the Wagstaff's (2005) () and the Erreygers' (2009) () corrected concentration indices can be written as follows:
| (6) |
| (7) |
Applying the Wagstaff's and Erreygers' corrections on the lower and upper bounds of the standard concentration index shown in Equation (5) will yield the lower and upper bounds of the and illustrated in Figure 1 as the sample size increases but assuming equal weighting.
FIGURE 1.

The upper and lower bounds of and illustrated
As expected, the upper and lower bounds of remain sensitive to the sample size, especially for smaller samples. However, the lower and upper bounds of remain agnostic to sample size. In fact, this is expected based on the normalization scheme implemented in the Wagstaff correction.
3. SHOULD WAGSTAFF'S AND ERRERYGERS' NORMALIZED CONCENTRATION INDICES BE INTERPRETED AS THE STANDARD CONCENTRATION INDEX?
In many empirical applications of the concentration index, researchers choose between Errerygers' or Wagstaff's corrected concentration index. Recently, the user‐friendly ‐conindex‐ user‐written Stata command (O’Donnell et al., 2016) has facilitated the computation and comparison of both indices. Apart from these popular normalized indices (, ), other indices proposed but not discussed in this paper include the symmetric “concentration” index (Erreygers et al., 2012) and the generalized concentration index (Clarke et al., 2002; Wagstaff et al., 1991).
Let us examine these normalized indices (i.e., , ) in turn for possible challenges for interpretation and policy.
3.1. The Wagstaff's corrected concentration index
In arriving at the normalization, Wagstaff (2005) shows how the mean of a binary health variable may affect the standard concentration index.
Consider a binary health variable where are corresponding values of the health variable for individuals sorted by a measure of socioeconomic status (SES) . The standard concentration index simplifies to when and . Similarly, the standard concentration index simplifies to when and (summarized based on Wagstaff, 2005).
Therefore, for a binary health variable, , when , and , the Wagstaff's corrected concentration index () will always simplify as:
| (8) |
Similarly, when and , . Wagstaff (2009) later noted that it is “reasonable” to expect that in these cases, in response to the challenge raised by Erreygers (2009, p.508).
3.2. The Erreygers' corrected concentration index
Now, let us turn to the Erregyers' normalized index. In many empirical applications, researchers interpret based on the traditional underpinning of the standard concentration index. For example, an index value estimated at 0.85 may be interpreted to mean a higher concentration of a health variable among wealthier groups compared to an index value estimated at 0.12. Unfortunately, , by design was not meant to be interpreted in that manner.
Now, suppose that is sufficiently large with individuals sorted by a measure of SES such that , and a binary health variable, exists such that and . As increases from to , the empirical values of decreases from zero (i.e., when ), attaining its lowest value () at the 50th SES percentile, then increases beyond this point, reaching zero at about the 100th SES percentile (i.e., when ; see Figure 2). Similarly, although not shown, as decreases from to , when and , the values of will increase steadily from zero when , reaching its highest value () at the 50% percentile and declining afterward to zero when . This behavior is by design to satisfy the transfer property. On this, Erreygers (2009, p.511) writes that “the most extreme pro‐poor inequality is obtained when both the rich and the poor constitute exactly half of the population and every member of the poor half has the maximum health level and every member of the rich half the minimum health level .” In our case with a binary health variable, the lowest () and highest () values of the health variable are and , respectively.
FIGURE 2.

Illustrating the behavior of and for an arbitrary binary health variable
4. IMPLICATIONS
What do these illustrations for Wagstaff's and Errerygers' normalizations mean for interpreting socioeconomic inequalities in health and for policy? Traditionally, the magnitude of the standard concentration index conveys the relative concentration of a variable in relation to the SES distribution. So, for the standard concentration index, when for a specific health variable, for example, this variable is relatively concentrated among poorer than wealthier groups. If the value should change to it means that the health variable's concentration among poorer groups has increased, ceteris paribus. Similarly, the concentration index increasing from to implies that the variable of interest is increasingly being concentrated among wealthier groups. As shown in Figure 2, unlike the case of the standard , the “symmetric” nature of means that we cannot interprete its values in a similar manner as with the standard . For a binary health variable, say full immunization for children in a country, and as shown in Figure 2, the value of is the same if for the bottom or the bottom of the population. For instance, when for the bottom 20% or the bottom 80% of the population. Applying the interpretation of the standard concentration index, this result would indicate that the level of socioeconomic inequality in full immunization coverage remains the same in both cases, irrespective of whether children in the bottom or the bottom of the population are fully immunized. This equivalent socioeconomic inequality may not necessarily be the case for policy to reduce socioeconomic inequality in full immunization coverage. Also, the “worst” socioeconomic inequality estimate () corresponds to the case where only children in the bottom of the population are fully immunized. As also shown in Figure 2, a progressive policy to gradually increase the number of fully immunized children beginning with the most impoverished child “worsens” socioeconomic inequalities in full immunization coverage until the bottom 50% of children are fully immunized based on but continues to “improve” socioeconomic inequalities based on the standard concentration index. Clearly, one should expect that a progressive policy targeting more impoverished children through full immunization, in this case, and as shown in Figure 2, should “improve” rather than “worsen” socioeconomic inequalities.
On the other hand, remains the same (i.e., ) whether only the bottom or of children are fully immunized, where and . The case is similar (i.e., ) if only children in the top of the population are fully immunized. This means that using , socioeconomic inequality in full immunization coverage for children in this hypothetical country remains high (i.e., or ) if all children in the bottom or the top of the population are fully immunized (see Figure 1). Also, a case where only the bottom 30% of children are not fully immunized but a progressive policy is implemented to increase immunization coverage that leaves the bottom 10% not fully immunized will not “improve” socioeconomic inequalities using . While these extreme cases for may rarely occur in many applications, they show that a progressive policy to increase immunization coverage among the poor would have no impact on inequality using when this will appear as inequality “improving” using the standard concentration index (see Figure 2), for instance.
Then, what is the appropriate index or normalization for empirical health economics analysis when using a binary health variable? Unfortunately, the answer to this question is not straightforward. However, it is essential to note that the need for normalizing the standard concentration index when using a binary health variable arose because the bounds of the standard concentration index depend on the mean (or proportion in this case) of the variable (Wagstaff, 2005). Erreygers (2009) shows that this “issue” with the bounds is not unique to binary health variables. As noted previously, it could be the case for many variables encountered in health economics research if a positive lower bound or a finite upper bound exists. So, if normalization is essential, should the standard concentration index not be normalized for most applications and not just for binary health variables? In fact, this poses a significant challenge, including when decomposing the normalized concentration index for a binary variable, for example, using the method introduced by Wagstaff et al. (2003). Thus, as shown in this paper, and for binary health variables, because and cannot always be interpreted in the same way as the standard concentration index, the intended use of the index should guide what index or normalization scheme researchers implement.
5. CONCLUSION
The concentration index remains popular for assessing socioeconomic inequalities in health (Wagstaff et al., 1991). Many applications of this index in the health economics literature use a binary health variable. With binary health variables, the normalization of the standard concentration index was initially proposed to, among other things, ensure that the value of the index lies in the −1 and +1 range. This paper begins by showing that the empirical bounds of the concentration index, including the Errerygers' corrected index, depended on the sample size and the relative weight variable (which is also related to the population). Importantly, it demonstrates that Wagstaff's and Errerygers' normalization for a binary health variable may produce results that may be “counter‐intuitive” for policy. , for instance, is perpetually fixed at −1 or +1 if only the bottom or the top of the population, respectively, records a value of one for the variable of interest. , on the other hand, provides the same value if only individuals in the bottom or the bottom of the population have the value of one for the health variable. By this finding, progressive achievements, when moving from the poor to the rich, will initially “worsen” socioeconomic inequality using with improvements only occurring when the value of the health variable is one for more than 50% of the population. Thus, the estimated values for cannot be interpreted in the same manner as the values of the standard concentration index. The “symmetric” nature of will make it challenging to compare socioeconomic inequalities between any two distributions, irrespective of the mean of the binary health variable. Unfortunately, researchers implementing and generally ignore these nuances and interpret these indices in much the same way as the standard concentration index. While there is still room to continuously improve indices for assessing socioeconomic inequalities in health, the purpose for empirical assessment should guide the choice of an appropriate index.
CONFLICT OF INTEREST
The author declares that there are no conflicts of interest.
ETHICAL STATEMENT
The study did not use any existing datasets; there are no ethical issues.
ACKNOWLEDGEMENT
The research emanated from the Global Network for Health Equity (GNHE), funded by the IDRC (Grant number 106439).
Ataguba, J. E. (2022). A short note revisiting the concentration index: Does the normalization of the concentration index matter? Health Economics, 31(7), 1506–1512. 10.1002/hec.4515
DATA AVAILABILITY STATEMENT
Data sharing does not apply to this article as only hypothetical data were generated and used for the study. The codes to generate the hypothetical dataset are available from the author upon request.
REFERENCES
- Ataguba, J. E. , Akazili, J. , & McIntyre, D. (2011). Socioeconomic‐related health inequality in South Africa: Evidence from general household surveys. International Journal for Equity in Health, 10, 48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen, Z. , & Roy, K. (2009). Calculating concentration index with repetitive values of indicators of economic welfare. Journal of Health Economics, 28, 169–175. [DOI] [PubMed] [Google Scholar]
- Clarke, P. M. , Gerdtham, U.‐G. , Johannesson, M. , Bingefors, K. , & Smith, L. (2002). On the measurement of relative and absolute income‐related health inequality. Social Science & Medicine, 55, 1923–1928. [DOI] [PubMed] [Google Scholar]
- Erreygers, G. (2009). Correcting the concentration index. Journal of Health Economics, 28, 504–515. [DOI] [PubMed] [Google Scholar]
- Erreygers, G. , Clarke, P. , & Van Ourti, T. (2012). “Mirror, mirror, on the wall, who in this land is fairest of all?”—distributional sensitivity in the measurement of socioeconomic inequality of health. Journal of Health Economics, 31, 257–270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Erreygers, G. , & Van Ourti, T. (2011). Measuring socioeconomic inequality in health, health care and health financing by means of rank‐dependent indices: A recipe for good practice. Journal of Health Economics, 30, 685–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kakwani, N. , Wagstaff, A. , & van Doorslaer, E. (1997). Socioeconomic inequalities in health: Measurement, computation, and statistical inference. Journal of Econometrics, 77, 87–103. [Google Scholar]
- O’Donnell, O. , O’Neill, S. , Van Ourti, T. , & Walsh, B. (2016). Conindex: Estimation of concentration indices. The Stata Journal, 16, 112–138. [PMC free article] [PubMed] [Google Scholar]
- O’Donnell, O. , Van Doorslaer, E. , Wagstaff, A. , & Lindelow, M. (2008). Analyzing health equity using household survey data: A guide to techniques and their implementation. World Bank. [Google Scholar]
- van Doorslaer, E. , & Koolman, X. (2004). Explaining the differences in income‐related health inequalities across European countries. Health Economics, 13, 609–628. [DOI] [PubMed] [Google Scholar]
- Wagstaff, A. (2005). The bounds of the concentration index when the variable of interest is binary, with an application to immunization inequality. Health Economics, 14, 429–432. [DOI] [PubMed] [Google Scholar]
- Wagstaff, A. (2009). Correcting the concentration index: A comment. Journal of Health Economics, 28, 516–520. [DOI] [PubMed] [Google Scholar]
- Wagstaff, A. , Paci, P. , & van Doorslaer, E. (1991). On the measurement of inequalities in health. Social Science & Medicine, 33, 545–557. [DOI] [PubMed] [Google Scholar]
- Wagstaff, A. , van Doorslaer, E. , & Watanabe, N. (2003). On decomposing the causes of health sector inequalities with an application to malnutrition inequalities in Vietnam. Journal of Econometrics, 112, 207–223. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data sharing does not apply to this article as only hypothetical data were generated and used for the study. The codes to generate the hypothetical dataset are available from the author upon request.
