Bayesian semiparametric analysis on the relationship between BMI and income for rural and urban workers in China

Lijuan Feng; Murat Munkin

doi:10.1080/02664763.2021.1935803

. 2021 Jun 5;49(12):3215–3235. doi: 10.1080/02664763.2021.1935803

Bayesian semiparametric analysis on the relationship between BMI and income for rural and urban workers in China

Lijuan Feng ^a,^CONTACT, Murat Munkin ^b

PMCID: PMC9415482 PMID: 36035612

ABSTRACT

This study examines the nonlinear relationship between BMI and earnings for workers in China using Bayesian semiparametric methods. Markov chain Monte Carlo (MCMC) methods are used to obtain the posterior distribution. We stratify the whole sample into four subsamples based on gender and type of residence area. Using longitudinal data from the China Health and Nutrition Survey (CHNS) from 1989 to 2011, we find nonlinear relationship for each group of workers, especially for rural females. For females in both rural and urban areas, being overweight and obese is associated with lower earnings. However, for males in both areas, earnings are not penalized for extra weight.

KEYWORDS: BMI, income, wage, Bayesian semiparametric method, nonlinearity, MCMC estimation

1. Introduction

Body weight, a factor that affects people's earnings, has drawn researchers' attention in the recent two decades. Literature has shown that people's body mass index (BMI), defined as weight divided by the square of height, is associated with their earnings. The effects are different in general between developed countries [2,3,7,21,27] and developing countries [13,28,32]. Results for developed countries usually show a wage penalty for being obese, mainly for females, while results for developing countries tend to show a positive association between body weight and earnings. Heterogeneous effects in terms of gender and occupation have been reported in studies for Chinese workers [15,30].

At least three possibilities exist that explain the relationship between BMI and earnings. First, body weight affects earnings. For example, being obese may lower earnings by lowering productivity or due to workplace discrimination coming from either employers [10,20,22] or customers [3]. Second, earnings affect body weight. People with lower income in developed countries tend to consume cheaper food rich in fat which could lead to obesity. Third, some unobserved factors can affect both body weight and earnings [4]. For instance, a lack of motivation for long-term investments in health and human capital can result in both obesity and low wages.

In this paper, we apply Bayesian semiparametric methods to study the association between BMI and earnings for workers in China using data from the China Health and Nutrition Survey (CHNS). Bayesian methods have been carried out to estimate the causal effect of BMI on wages nonparametrically using British data [16]. In the study of the causal effect of BMI on income, endogeneity problem exists: effects of the second and third mechanism result in biased estimators, which are referred to as reverse causality and omitted variable bias. Kline and Tobias (2008) used the instrumental variable (IV) approach to identify the causal effect. Parents' BMIs were used as IVs for individual's BMI. Parents' BMIs were measured 20 years ago when the individuals were at the age of 10 to control for the effect of ‘shared family environment’. Due to data availability in the current study, we cannot get reasonable instrumental variables. Therefore, the objective of the study is not to show the causal relationship of BMI on earnings, but rather to estimate the association between these two variables.

This study contributes to the literature by being the first to apply Bayesian semiparametric methods to estimate the relationship between BMI and earnings for Chinese workers. Compared with the classical frequentists' methods, Bayesian methods have several advantages and are better suited to this study. First, Bayesian estimations for nonlinearity converge well at points near the boundary. Second, there is no restriction on sample size. Since Bayesian estimations do not rely on large samples to get consistent estimators, we can study more homogenous groups of workers to get better results for each group. Third, Bayesian methods have high efficiency in terms of computer time and convergence.

Results from previous studies indicate the existence of nonlinear relationships [7,16,30]. At different BMI levels, BMI was found to affect earnings differently. It seems reasonable that for underweight people the effect should be positive because an increase in body weight represents better nutrition status. But for overweight and obese people, the effect might become negative due to decreases in productivity or discrimination from employers. Therefore, unlike some previous studies using a linear or piece-wise linear relationship, we do not assume any given functional form for the effect of BMI. In a multivariate regression model, fully nonparametric estimation for every independent variable is computationally demanding and ‘curse of dimensionality’ problem arises [11,23]. When there is only one focal independent variable, semiparametric methods provide satisfactory estimation. Classical semiparametric methods do not estimate well at boundary points [12], while Bayesian estimations do not have this problem.

Some studies consider nonlinearity using categorical variables [1,3,7] to allow for a different slope for each BMI category, i.e. underweight, normal weight, overweight, or obese. For white people, the cutoff values of 25 and 30 for overweight and obese are commonly used. Medical Studies suggest that lower BMI cutoff values should be applied to Asian populations [8,14], but there is no consensus on the cutoff values. In addition, since the suggested cutoff values are decided in the medical literature, there is no evidence that they are relevant in the current study. When using the fully nonparametric method for BMI, we can avoid the choice of cutoff values and obtain more accurate estimates.

This study also contributes to the literature by modeling nonlinear relationships for different groups of workers. In addition to uncovering the nonlinear relationship between BMI and income, we estimate different effects for males and females in urban and rural areas. Implementation of Chinese economic reforms in the late 1970s has resulted in rapid urbanization and economic development. Based on the data from the China Nutrition and Health Survey (CHNS), average individual annual income increased rapidly from 1989 to 2011, reaching 26,000 Yuan, which was more than six times the initial levels (adjusted for inflation) for urban citizens. Opening up to the Western market has increased the availability and diversity of food. A Western-style diet rich in energy and fat has affected young generations. The changing occupational structure, ways of transportation, and entertainment have resulted in more physical inactivity. With the nutrition transition and decrease in physical activity, body weight has increased over the past decades [5,24]. Overweight and obesity have become a concern [25].

The Chinese society displays huge differences between rural and urban areas. Due to historical reasons and uneven development levels, big cities in China, such as Beijing and Shanghai can be viewed to be a part of the developed world; while rural areas still belong to the developing world. Opening-up to the Western culture has changed the views and now females relate slenderness to beauty [9,33]. However, the traditional view on males still associates weight with power and affluence. By stratifying the sample into subsamples based on gender and area of residence, we control for observed heterogeneity and estimate the relationship for each group.

Overall, the results show evidence of nonlinearity, which varies for a different group of people. We find that BMI and income are negatively related to each other for overweight and obese females. Especially for urban females, these two variables are always negatively correlated, and as BMI increases income decreases for the whole BMI distribution. These results are similar to white females in developed countries [7,16], suggesting that females in cities with better education, higher income, and more white-collar jobs are more affected by modern culture and view thinness as valuable. On the other hand, for males, the association between BMI and income is mainly positive, showing that a higher BMI is associated with a higher income. It is consistent with evidence from other developing countries and reflecting a cultural preference for men to be heavier [34].

The paper is organized as follows. In Section 2, we describe the estimation strategy with the Bayesian semiparametric model and test the performance of the model in simulation studies in Section 3. In Section 4, we introduce the data used for our empirical study and present results in Section 5. Finally, in Section 6, we discuss the results, implications, and future research directions.

2. Econometric framework

2.1. The model

To model BMI nonparametrically we follow Koop et al. [18] and Munkin and Trivedi (2008) and specify

Y_{i} = f (s_{i}) + X_{i} β + ϵ_{i}

(1)

where $s_{i}$ and $Y_{i}$ are BMI and logarithm of total income for individual i. $X_{i}$ is a vector of exogenous regressors, $β$ is a conformable vector of parameters and without an intercept. The distribution of the error term $ϵ_{i}$ is $N (0, σ^{2})$ . Function $f (\cdot)$ is unknown and we will estimate it nonparametrically. The BMI variable can potentially have as many different values as the number of observations. To avoid the problem of having to estimate too many parameters, we try two roundings of the BMI variable: up to 0.01 and 0.001. For the urban female sample, this generates $k_{λ} = 1008$ and $k_{λ} = 1852$ distinct BMI values out of the total N = 2522 observations, respectively. We sort the data by values of s so that $s_{1}$ is the smallest value of BMI and $s_{k_{λ}}$ is the largest. We assume that function $f (s_{i})$ is smooth and differentiable and its slope changes slowly with respect to $s_{i}$ [29]. More specifically, we assume that the first derivative of $f (s)$ is bounded by a constant C,

∣ f (s_{i}) - f (s_{i - 1}) ∣ \leq C ∣ s_{i} - s_{i - 1} ∣ .

A wide range of functions satisfy this condition.

2.2. The priors

Stacking (1) over i we can obtain

Y = Z λ + X β + ϵ

where

λ = [\begin{matrix} f (s_{1}) \\ f (s_{2}) \\ \dots \\ f (s_{k_{λ}}) \end{matrix}]

and $Z$ is an $N \times k_{λ}$ matrix designed to choose the corresponding element of $λ$ for each observation i. Construct a $k_{λ} \times k_{λ}$ matrix $D$ such that $ϕ = D λ$ is a vector of approximation for the second derivative of function $f (.)$ ,

ϕ_{j} = \frac{λ_{j} - λ_{j - 1}}{s_{j} - s_{j - 1}} - \frac{λ_{j - 1} - λ_{j - 2}}{s_{j - 1} - s_{j - 2}}, j = 3, \dots, k_{λ},

and the first two elements of $ϕ$ are $ϕ_{1} = f (s_{1})$ and $ϕ_{2} = f (s_{2})$ serving as the initial condition.

Then the model becomes

Y = Z D^{- 1} ϕ + X β + ϵ .

(2)

We place proper but not informative priors on $(ϕ_{1}, ϕ_{2})$ as $N (0_{2}, I_{2})$ and an informative prior for the rest of the parameter vector

ϕ_{j} \sim N (0, η), j = 3, \dots, k_{λ},

where the smoothing parameter η follows an inverse gamma distribution

η \sim I G (a, b) .

The chosen priors imply that changes in $f (s_{i}) - f (s_{i - 1})$ should be small. The choice of D and hyperparameters a and b decides the smoothness of the estimated curve $f (s)$ , and it can be compared with the classical issue of optimal bandwidth selection [17]. If the prior of η is too tight, the estimated function will be simply linear. If it is too loose, the result will be too jumpy.

For other parameters, we also select proper priors for β,

β \sim N (0, 10 I_{k}) .

And for $σ^{2}$ ,

σ^{2} \sim I G (c, d),

where $I G (c, d)$ denotes an inverse gamma distribution.

3. Simulation study

Before applying to observational data, we demonstrate the estimation methods proposed in Section 2 by a simulation study in this section. The data sets were generated from the following equation:

y_{i} = f (s_{i}) + X_{i} β + ϵ_{i}, i = 1, 2, \dots, n

The function form of $f (s_{i})$ is taken as following:

f (s_{i}) = .15 s_{i} + \exp [- 10 (s_{i} - 1)^{2}]

and $s_{i}$ takes random values from a uniform distribution $U (- 3, 3)$ . n is the sample size. The main purpose is to estimate the function $f (s)$ when covariates exist. Covariates X is defined as $X_{i} = [x_{1 i}, x_{2 i}]$ , where $x_{1 i} \sim N (- 1.4, 1)$ , $x_{2 i} \sim N (3, 4)$ . $ϵ_{i} \sim N (0, 0.01)$ is the error term. The true values of unknown parameters are given by $β_{t r u e} = [0.4, 0.8]^{'}$ .

Matlab is used in this study to carry out the MCMC algorithm with Gibbs sampler simulation. The Markov chains converge very fast to the stationary distribution. After 50 iterations, the posterior mean for the coefficient β converges to the true value. Therefore, after discarding the first 50 iterations, we collect 500 iterations to compute the posterior means and 95%credible intervals for the coefficients β, as shown in Figure 1. To assess the convergence properties of the MCMC chain, we calculated the inefficiency factors (IEF) and the effective sample sizes [18]. The IEFs for $σ^{2}, β_{1}, and β_{2}$ are 1.326, 1.066, and 1.000, respectively. The effective sample sizes for these three parameters are 377, 469, and 500, respectively. The IEFs are close to 1 and the effective sample sizes are equal to or smaller than 500, which shows good convergence properties of the MCMC chain. Other parameters of ϕ for the nonlinear function have the same features for IEF and the effective sample size.

Figure 1. — Convergence, posterior mean, and 95% credible intervals of parameters in the simulation study. (a) β1=0.401 95%CI(0.401,0.402). (b) β2=0.804 95%CI(0.804,0.804).

As we mentioned in Section 2, the estimated function is affected by the value of η, which is governed by the choice of the hyperparameters a and b. After some experimentations, we find a = 3 and $b = 10^{5}$ provide smooth posteriors.

To test the effect of the sample size, we compare results with different sample size: 30, 50, 100, 200. The results are shown in Figure 2. The dark solid line is the true curve. Others represent simulation results with different sample sizes. The dotted line, the dashed line, the red solid line, and the dash-dot line represent results for sample sizes of 30, 50, 100 and 200, respectively. Simulation results approach the true curve as the sample size increases. We also test sample sizes larger than 200, such as 300, 400 and 500 and the results don't get improved. So we take the sample size of 200 in the rest of the simulation study.

We also compare the current semiparametric model with a quadratic model:

y_{i} = γ_{0} + γ_{1} (s_{i})^{2} + X_{i} β + ϵ_{i}, i = 1, 2, \dots, n

Deviance Information Criterion(DIC) [31] is used to compare the two models:

D I C = p_{d} + \bar{D} (θ) .

Deviance is defined as $D (θ) = - 2 l o g (p (y | θ)) + C$ , where y are the data, θ are the parameters of the model, and $p (y | θ)$ is the likelihood function. C is constant that cancels out in calculations comparing the two models. The effective number of parameters is $p_{d} = \bar{D} (θ) - D (\bar{θ})$ . The results are shown in Table 1. Based on DIC, models with smaller DIC should be preferred to models with larger DIC. So the semiparametric model used in this study is preferred to the quadratic model.

Table 1.

Model comparison with deviance information criterion (DIC).

	Semiparametric model	Quadratic model
$\bar{D} (θ)$	−349.05	50.17
$D (\bar{θ})$	−375.3	45.39
$p_{d}$	26.25	4.78
DIC	−322.8	54.95

Open in a new tab

4. Data

The China Health and Nutrition Survey (CHNS) is a household survey. It collects individual information on education, income, diet, health status, physical activity, nutrition status, etc. Household information such as wealth, assets, income, and other benefits, as well as community information, are also included. The survey areas include nine provinces and three municipal cities. Although not nationally representative, it randomly chooses communities from regions that are very distinct in geographical features, economic development, and natural and social environment to obtain large difference over time and space [26]. The survey is conducted every two or three years, and ten rounds of data are available: 1989, 1991, 1993, 1997, 2000, 2004, 2006, 2009, 2011, and 2015. Some of the variables used in this paper are not available in 2015. Wave 1989 includes only females aged 45 years and younger which is different from the age range in other waves. Therefore, these two waves are not included in the study.

Some previous studies on the relationship between BMI and income using self-assessed data are subject to measurement errors. An advantage of the CHNS data, particularly for the current study, is that height and weight are measured by a third party during clinical visits, which greatly reduces measurement errors from self-assessed data and improves the accuracy of the estimates. We remove BMI outliers near BMI boundaries, dropping the observations at the lower and upper 0.5 percent of the BMI distribution by gender.

According to China's labor law during the study period, the retirement age is 60 for males and 55 for females in general. Individuals aged 18-60 for men and 18-55 for women, who are not retired, are included in the current study to be the major working population sample. CHNS is a longitudinal survey, and it follows the same individuals when possible. For the working population, the mean repetition time of individuals is 2.27, with a standard deviation of 1.75. To avoid serial correlation, we use only one observation for each individual. For multiple observations from the same individual, we keep the most recent observation. The whole sample is stratified based on gender and rural/urban area of residence to control for the observed heterogeneity.

4.1. Dependent variable and covariates

The logarithm of earnings in Chinese Yuan is used as the dependent variable. Earnings in the study are represented by total annual individual income. For samples in urban areas, wages from employment are also used as a comparison. Annual individual income refers to individual total annual income from all possible sources such as business, farming, fishing, gardening, livestock, and non-retirement wages, adjusted relative to 2011 Yuan CPI. For workers with more than one occupation or farmers who have earnings from activities other than farming, total income includes all of their incomes. In the CHNS survey, income is asked for the year before the current survey year. For example, for the 2011 survey, the income variable represents income during 2010.

The reasons for using total incomes instead of wages for all samples, especially for rural samples are twofold. First, for the majority of farmers in the study, only income is reported. Second, income sources such as subsidies for one-child, food, utility, health, gifts, rent, and in-kind payments are not included in the individual total income variable (these are included in the household income variable). So individual total income variable is a measure of income from labor market participation.

Individual-level covariates include educational attainment (years of education), age and age squared, marital status, school enrollment status, minimum age of children under 18 years old, percentage of children in a household, and type of occupation. Education and occupation variables are used to control for the level of human capital. Current or recent pregnancy affects body weight. Therefore, women who were pregnant during the survey time are not included. Women who are breastfeeding during the interview are controlled for by the minimum age of their children. The percentage of kids in a household affects the employment status and time devoted to work.

Dummy variable indicating type of occupation, white collar and blue collar, is also included [7,30]. White collar includes senior and junior professional/technical, administrator/executive/manager, office staff, army officers and police officers, service workers, and athletes, actors, and musician. Blue collar includes farmers, fishermen, hunters, skilled and non-skilled workers, ordinary soldiers, policemen, drivers, and small household businesses. Dummy variables for the year of survey and province/municipal cities are also included to control for the time and province fixed effect.

4.2. Summary statistics

Sample sizes and summary statistics are presented in Table 2. Average BMI values are similar for females in urban and rural areas. Urban females have more years of education and earn higher income compared with rural females. The mean values of years of education for urban and rural females are 10.48 and 6.693 years, respectively. Urban females are also more likely to have white-collar jobs. The same pattern holds for males, although males in urban areas are heavier than those in rural areas.

Table 2.

Summary statistics for main variables.

Variable	N	Mean	St.D.	Min	Max
$U r b a n F e m a l e$
lnTotalIncome	2,522	9.185	1.132	1.843	12.75
lnWage	2,078	9.093	1.318	3.912	12.79
BMI	2,522	22.74	3.153	16.61	33.36
Age	2,522	39.05	9.461	18.40	55
Education	2,522	10.48	3.736	0	18
Married	2,522	0.814	0.389	0	1
WhiteCollar	2,522	0.646	0.478	0	1
School	2,522	0.0143	0.119	0	1
PercChild	2,522	0.170	0.160	0	0.667
MinAge	2,522	13.11	5.787	0	18
Wave	2,522	2,004	7.529	1,991	2,011
$R u r a l F e m a l e$
lnTotalIncome	4,889	8.579	1.326	0.763	13.11
lnWage	1,688	8.696	1.321	3.401	12.98
BMI	4,889	22.84	3.199	16.61	33.33
Age	4,889	40.22	11.41	18	55
Education	4,889	6.693	3.887	0	17
Married	4,889	0.812	0.391	0	1
WhiteCollar	4,889	0.183	0.387	0	1
School	4,889	0.0141	0.118	0	1
PercChild	4,889	0.171	0.167	0	0.714
MinAge	4,889	12.42	6.270	0	18
Wave	4,889	2,004	7.144	1,991	2,011
$U r b a n M a l e$
lnTotalIncome	2,911	9.487	1.059	4.217	13.10
lnWage	2,566	9.387	1.258	4.787	13.10
BMI	2,911	23.77	3.103	16.74	32.53
Age	2,911	42.73	11.03	18.10	60
Education	2,911	10.94	3.438	0	18
Married	2,911	0.848	0.359	0	1
WhiteCollar	2,911	0.557	0.497	0	1
School	2,911	0.0117	0.107	0	1
PercChild	2,911	0.151	0.157	0	0.600
MinAge	2,911	13.29	5.943	0	18
Wave	2,911	2,005	7.294	1,991	2,011
$R u r a l M a l e$
lnTotalIncome	5,168	8.994	1.340	0.271	13.38
lnWage	2,577	9.121	1.281	2.996	13.08
BMI	5,168	22.81	3.099	16.62	32.59
Age	5,168	42.60	12.66	18	60
Education	5,168	7.957	3.339	0	18
Married	5,168	0.818	0.385	0	1
WhiteCollar	5,168	0.183	0.387	0	1
School	5,168	0.0155	0.123	0	1
PercChild	5,168	0.160	0.164	0	0.714
MinAge	5,168	12.33	6.446	0	18
Wave	5,168	2,005	6.747	1,991	2,011

Open in a new tab

Note: CHNS working population.

5. Empirical results

5.1. Convergence

We run the MCMC algorithm discarding the first 50 iterations and collecting 500 iterations after the burn-in stage and use the series to get the mean and standard deviation for each parameter. The Markov chains have excellent convergence properties for all parameters. In this section, the value of η is fixed at $10^{- 6}$ which produces smooth curves for the estimated income-BMI relationship. The graphs in Figure 3 show convergence for two parameters: coefficient for age and mid-point value for the nonlinear function $f (B M I)$ .

Figure 3. — Convergence of parameters for urban female sample. (a) Coefficient for Age. (b) f(926).

5.2. Results for female samples

In Figure 4(a), we present estimates of function $f (s)$ for the subsample of urban females. The solid line is the estimated posterior means of $f (B M I)$ , the dashed lines are the confidence intervals corresponding to one standard deviations from the means. The conditional wage function $f (B M I)$ is decreasing. The slope of the function becomes steeper at BMI value of around 24. Therefore, for females in urban areas, we observe a wage penalty as BMI increases and the penalty is larger for the overweight and obese.

Posterior means, standard deviations, probabilities of being positive, and 95% credible intervals of other covariates for urban female sample are presented in Table 3. The results presented in the table are consistent with expectations. In particular, we see evidence of a quadratic relationship in age. Year of education affects income positively, with each additional year of education, log income increases by 6.3 percent on average. People with white collar job earn 25.4 percent higher compared to their blue collar counterparts. Over time income increases substantially. Finally, we find significant regional difference in income.

Table 3.

Parameter posterior means, standard deviations, probabilities of being positive, and 95% credible intervals.

Variable	$E (\cdot \| y)$	$\sqrt{V a r (\cdot \| y)}$	Pr( $\cdot > 0 \| y$ )	95% CI
Age	0.073	0.019	1	[0.072,0.074]
Age²	−0.001	0	0	$[- 0.001, - 0.001]$
Education	0.063	0.006	1	[0.063,0.063]
Married	0.056	0.056	0.862	[0.053,0.058]
WhiteCollar	0.256	0.037	1	[0.254,0.257]
School	0.076	0.132	0.694	[0.071,0.081]
PercChild	−0.266	0.144	0.03	$[- 0.272, - 0.261]$
MinAge	0.001	0.004	0.654	[0.001,0.001]
Wave1993	0.14	0.069	0.984	[0.137,0.142]
Wave1997	0.336	0.078	1	[0.333,0.339]
Wave2000	0.557	0.078	1	[0.554,0.560]
Wave2004	0.709	0.081	1	[0.706,0.712]
Wave2006	0.876	0.085	1	[0.873,0.880]
Wave2009	1.233	0.08	1	[1.230,1.236]
Wave2011	1.359	0.071	1	[1.356,1.362]
t21 $^{a}$	−0.298	0.088	0	$[- 0.302, - 0.295]$
t23	0.053	0.088	0.674	[0.050,0.056]
t31	0.257	0.078	1	[0.254,0.260]
t32	−0.091	0.08	0.112	$[- 0.094, - 0.088]$
t37	−0.211	0.087	0.002	$[- 0.215, - 0.208]$
t41	−0.209	0.082	0.008	$[- 0.212, - 0.206]$
t42	−0.248	0.081	0.002	$[- 0.251, - 0.245]$
t43	−0.188	0.083	0.01	$[- 0.192, - 0.185]$
t45	−0.309	0.082	0	$[- 0.312, - 0.306]$
t52	−0.075	0.085	0.192	$[- 0.078, - 0.071]$
t55	0.11	0.097	0.84	[0.107,0.114]

Open in a new tab

Note: Urban female subsample (n=2522). $^{a}$ t21–55 represent provinces or municipal cities.

Figure 4(b) gives results for females in rural areas. The conditional wage function peaks at the BMI value of around 24. The function is an inverted U-shaped, which means high and low BMI values are associated with lower income.

Comparing Figure 4(a ,b), for females in urban and rural areas we can conclude that for both groups being overweight and obese are associated with lower income. But the effects of being underweight are different. For rural females, being underweight is associated with lower income, but not for urban females. People live in cities have higher education level and more white color jobs in general. In addition, they view beauty differently and value being thin more. Therefore, being underweight does not lower income, but instead increase it.

5.3. Results for Male samples

Figure 5 shows estimates of $f (B M I)$ for males in urban and rural areas. For both groups, the curves have positive slopes, which means higher BMI is associated with higher income. We do not observe the negative income effect of being overweight and obese as found for females groups.

Table 4 presents the coefficient posterior means, standard deviations, probabilities of being positive, and 95% credible intervals of other covariates for urban male sample. The results are, for the most part, very similar to those obtained for the urban female sample. Specifically, we see strong evidence of a quadratic profile in age. Age, years of education, married, and white collar job are important explanatory variables in the log income equation.

Table 4.

Parameter posterior means, standard deviations, probabilities of being positive, and 95% credible intervals.

Variable	$E (\cdot \| y)$	$\sqrt{V a r (\cdot \| y)}$	Pr( $\cdot > 0 \| y$ )	95% CI
Age	0.069	0.012	1.00	[0.069,0.069]
Age²	−0.001	0	0.00	$[- 0.001, - 0.001]$
Education	0.057	0.005	1.00	[0.057,0.057]
Married	0.112	0.054	0.968	[0.110,0.114]
WhiteCollar	0.154	0.032	1.00	[0.153,0.156]
School	−0.008	0.129	0.448	$[- 0.013, - 0.004]$
PercChild	−0.074	0.133	0.296	$[- 0.079, - 0.069]$
MinAge	0.001	0.003	0.626	[0.001,0.001]
Wave1993	−0.05	0.064	0.186	$[- 0.053, - 0.048]$
Wave1997	0.203	0.075	0.992	[0.200,0.206]
Wave2000	0.604	0.072	1.00	[0.602,0.607]
Wave2004	0.719	0.073	1.00	[0.717,0.722]
Wave2006	0.883	0.068	1.00	[0.881,0.886]
Wave2009	1.159	0.07	1.00	[1.156,1.161]
Wave2011	1.406	0.063	1.00	[1.404,1.409]
t21 $^{a}$	−0.189	0.076	0.006	$[- 0.192, - 0.187]$
t23	−0.032	0.074	0.39	$[- 0.034, - 0.029]$
t31	0.077	0.065	0.872	[0.075,0.080]
t32	0.018	0.071	0.59	[0.015,0.020]
t37	−0.154	0.069	0.02	$[- 0.156, - 0.151]$
t41	−0.185	0.073	0.004	$[- 0.188, - 0.183]$
t42	−0.269	0.072	0.00	$[- 0.272, - 0.266]$
t43	−0.115	0.071	0.07	$[- 0.118, - 0.112]$
t45	−0.223	0.07	0.002	$[- 0.225, - 0.220]$
t52	−0.066	0.071	0.168	$[- 0.068, - 0.063]$
t55	−0.09	0.084	0.146	$[- 0.093, - 0.087]$

Open in a new tab

Note: Urban male subsample (n=2911). $^{a}$ t21–55 represent provinces or municipal cities.

5.4. Results using wages

This section compares results derived when wages are used as dependent variables with those of income. As we mentioned in the data section, for individuals in rural areas, many people do not have wage data due to the fact that they are not formally employed. In our subsamples, only around 35% of rural females and 50% of rural males have wages. However, for people living in urban areas, most of income comes from wages. In the urban subsamples, 82% of females and 88% of males have wages. Therefore, in addition to the effect of BMI on income, we can estimate the effect of BMI on wages. The comparison is shown in Figures 6 and 7.

Figure 6. — Compare $f (B M I)$ using income and wages for urban females. (a) Income. (b) Wage.

Figure 7. — Compare $f (B M I)$ using income and wages for urban males. (a) Income. (b) Wage.

In general, the shape of $f (B M I)$ function is similar using the two earning measures. But for both genders, the penalty for being overweight and obese is higher using wages. Wages come directly from employment in the labor market, while income includes wages as well as other earnings not directly from employment. Because part of the penalty of being obese comes from discrimination in the labor market, it is expected to see more penalty when using wages.

6. Conclusion

This paper estimates nonlinear relationship between BMI and earnings in China using Bayesian semiparametric methods. We model the income-BMI relationship nonparametrically, while other covariates enter the equation linearly. A Bayesian posterior simulator is designed and employed to fit the model. MCMC methods with Gibbs sampler are used to get the posterior distribution of the parameters. The model converges well and has high computing efficiency in comparison to other existing semiparametric models with classical methods. Based on deviance information criterion (DIC), the proposed model is preferred to the quadratic model. The nonlinear relationships for each subgroup workers are estimated and interesting results are presented.

We found evidence of nonlinearity in the relationships between BMI and earnings. The shapes of the estimated regression functions are different for males and females and in urban and rural areas. For males in general, we observe a positive relationship between BMI and earnings. which means an increase in BMI increases their earnings, and the slopes are near constant for the whole BMI distribution. Conversely, females receive a penalty for being overweight or obese. Rural females also receive a penalty for being underweight, while being underweight does not decrease earnings for urban females. When replacing income with wages for urban samples, the results in general keep the same pattern, and we found more penalty of being overweight or obese, which might be due to discrimination from employers. For urban males when using wage as a measure of earnings, the curve is flatter after BMI value of 24, which indicates that the reward of extra weight is smaller when their BMI is higher than 24.

Males are traditionally seen as the main breadwinner in the household, and as such have a cultural preference to be heavier. A bigger BMI is associated with being healthy, prosperous, and even powerful [34]. It explained the mostly positive relationship between BMI and earnings, especially for males in rural areas. It coincides with the results from developing countries where higher BMI is related to better nutrition status and better health. For females, especially those in urban areas, being obese is negatively related to their earnings, which is similar to the results for white females in developed countries [7]. It is consistent with studies finding that being slim is generally viewed as an indicator of good health and beauty for females. Influenced by the Western life-style [6,19], Chinese females with higher socioeconomic status have become to value thinness more. Our results indicate that females seem to be moving faster than males in the transition from a developing country to developed country in terms of the earning effect of body weight.

With the potential of Bayesian semiparametric model introduced in the study, future research can expand the model to consider more features of the observational data such as panel data feature to identify the causal effect of BMI on individuals' earnings. This type of research has more policy implications regarding public health, employment, and social policies. It also helps to understand the affecting factors of labor market outcomes for different groups of people and the social and psychological attitude/pressure of weight gain among individuals.

Acknowledgements

This research uses data from China Health and Nutrition Survey (CHNS). We thank the National Institute of Nutrition and Food Safety, China Center for Disease Control and Prevention, Carolina Population Center, the University of North Carolina at Chapel Hill, the NIH (R01-HD30880, DK056350, and R01-HD38700) and the Fogarty International Center, NIH for financial support for the CHNS data collection and analysis files from 1989 to 2006 and both parties plus the China-Japan Friendship Hospital, Ministry of Health for support for CHNS 2009 and future surveys.

Appendices.

Appendix A. Computational Appendix – The MCMC Algorithm

For each observation i, the likelihood function is

P r [Y_{i} | X_{i}, ϕ, β, σ^{2}] = (2 π σ^{2})^{\frac{- 1}{2}} \exp [- 0.5 σ^{- 2} (Y_{i} - Z_{i} D^{- 1} ϕ - X_{i} β)^{2}]

By multiplying the N independent observation over $i = 1, \dots, N$ , we get the joint distribution for all observations. The posterior density of the parameters is proportional to the product of the prior density and the joint distribution of observables.

We block the parameter set as $[ϕ, β]$ and $σ^{2}$ . Markov chain Monte Carlo (MCMC) method is used to obtain the posterior distribution. The steps of the MCMC algorithm are the following:

Denote $W_{i} = (Z_{i} D^{- 1}, X_{i})$ , $θ^{'} = (ϕ^{'}, β^{'})$ . Let the prior distribution of $θ$ be $N (\underline{θ}, {\underline{H}}_{θ}^{- 1})$ . Then the full conditional distribution of θ is
$θ | σ^{2}, Y, W \sim N (\bar{θ}, {\bar{H}}_{θ}^{- 1}),$ (A1)
where
$\begin{aligned} {\bar{H}}_{θ} & = {\underline{H}}_{θ} + σ^{- 2} \sum_{i = 1}^{N} W_{i}^{'} W_{i} \\ \bar{θ} & = {\bar{H}}_{θ}^{- 1} [{\underline{H}}_{θ} θ + σ^{- 2} \sum_{i = 1}^{N} W_{i}^{'} Y_{i}] . \end{aligned}$
Let the prior distribution of $η$ be $I G (\underline{a}, \underline{b})$ . The full conditional distribution of $η$ is
$η | θ, Y, W, σ^{2} \sim I G (\bar{a}, \bar{b}),$ (A2)
where
$\begin{aligned} \bar{a} & = \frac{k_{λ} - 2}{2} + \underline{a} \\ \bar{b} & = {[{\underline{b}}^{- 1} + \frac{1}{2} \sum_{j = 3}^{k_{λ}} ϕ_{j}^{2}]}^{- 1} . \end{aligned}$
Finally, let the prior distribution of $σ^{2}$ be $I G (\underline{c}, \underline{d})$ . The full conditional distribution of $σ^{2}$ is
$σ^{2} | θ, Y, W \sim I G (\bar{c}, \bar{d}),$ (A3)
where
$\begin{aligned} \bar{c} & = \frac{n}{2} + \underline{c} \\ \bar{d} & = {[{\underline{d}}^{- 1} + \frac{1}{2} (Y - W θ)^{'} (Y - W θ)]}^{- 1} . \end{aligned}$

This concludes the MCMC algorithm.

Appendix B. Computer codes for the MCMC algorithm

This appendix includes computer codes used for the MCMC algorithm in the simulation study.

Appendix B.

Note: The program was modified by the authors and was built on the code from the textbook Bayesian Econometrics Methods [18].

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Atella V., Pace N., and Vuri D., Are employers discriminating with respect to weight? European evidence using quantile regression, Econ. Hum. Biol. 6 (2008), pp. 305–329. [DOI] [PubMed] [Google Scholar]
2.Averett S.L. and Korenman S., The economic reality of the beauty myth, J. Hum. Resour. 31 (1996), pp. 304–330. [Google Scholar]
3.Baum C.L. and Ford W.F., The wage effects of obesity: A longitudinal study, Health. Econ. 13 (2004), pp. 885–899. [DOI] [PubMed] [Google Scholar]
4.Becker G.S. and Mulligan C.B., The endogenous determination of time preference, Q. J. Econ. 112 (1997), pp. 729–758. [Google Scholar]
5.Bell A.C., Ge K., and Popkin B.M., The road to obesity or the path to prevention: Motorized transportation and obesity in china, Obes. Res. 10 (2002), pp. 277–283. [DOI] [PubMed] [Google Scholar]
6.Bonnefond C. and Clément M., Social class and body weight among Chinese urban adults: The role of the middle classes in the nutrition transition, Soc. Sci. Med. 112 (2014), pp. 22–29. [DOI] [PubMed] [Google Scholar]
7.Cawley J., The impact of obesity on wages, J. Hum. Resour. 39 (2004), pp. 451–474. [Google Scholar]
8.Chen C. and Lu F., et al. The guidelines for prevention and control of overweight and obesity in Chinese adults, Biomed. Environ. Sci.: BES 17 (2004), pp. 1. [PubMed] [Google Scholar]
9.Chen H. and Jackson T., Prevalence and sociodemographic correlates of eating disorder endorsements among adolescents and young adults from China, Eur. Eat. Disord. Rev.: Prof. J. Eat. Disord. Assoc. 16 (2008), pp. 375–385. [DOI] [PubMed] [Google Scholar]
10.Everett M., Let an overweight person call on your best customers? Fat chance, Sales Mark. Manage. 142 (1990), pp. 66–70. [Google Scholar]
11.Fan J. and Gijbels I., Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66, Chapman and Hall/CRC press, 1996. [Google Scholar]
12.Feng L., Essays in Applied Microeconomics. Graduate Theses and Dissertations, 2019. Available at https://scholarcommons.usf.edu/etd/7784.
13.Glick P. and Sahn D.E., Health and productivity in a heterogeneous urban labour market, Appl. Econ. 30 (1998), pp. 203–216. [Google Scholar]
14.He W., Li Q., Yang M., Jiao J., Ma X., Zhou Y., Song A., Heymsfield S.B., Zhang S., and Zhu S., Lower BMI cutoffs to define overweight and obesity in China, Obesity 23 (2015), pp. 684–691. [DOI] [PubMed] [Google Scholar]
15.Huang C.C., Yabiku S.T., Ayers S.L., and Kronenfeld J.J., The obesity pay gap: Gender, body size, and wage inequalities – a longitudinal study of Chinese adults, 1991–2011, J. Popul. Res. 33 (2016), pp. 221–242. [Google Scholar]
16.Kline B. and Tobias J.L., The wages of BMI: Bayesian analysis of a skewed treatment–response model with nonparametric endogeneity, J. Appl. Econom. 23 (2008), pp. 767–793. [Google Scholar]
17.Koop G. and Poirier D.J., Bayesian variants of some classical semiparametric regression techniques, J. Econom. 123 (2004), pp. 259–282. [Google Scholar]
18.Koop G., Poirier D.J., and Tobias J.L., Bayesian Econometric Methods, Cambridge University Press, Cambridge, 2007. [Google Scholar]
19.Luo Y., Parish W.L., and Laumann E.O., A population-based study of body image concerns among urban Chinese adults, Body Image. 2 (2005), pp. 333–345. [DOI] [PubMed] [Google Scholar]
20.McLean R.A. and Moon M., Health, obesity, and earnings, Am. J. Public. Health. 70 (1980), pp. 1006–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Morris S., The impact of obesity on employment, Labour Econ. 14 (2007), pp. 413–433. [Google Scholar]
22.Pagan J.A. and Davila A., Obesity, occupational attainment, and earnings, Soc. Sci. Q. 78 (1997), pp. 756–770. [Google Scholar]
23.Pagan A. and Ullah A., Nonparametric Econometrics, Cambridge University Press, Cambridge, 1999. [Google Scholar]
24.Popkin B.M., The nutrition transition in the developing world, Dev. Policy. Rev. 21 (2003), pp. 581–597. [Google Scholar]
25.Popkin B.M., Adair L.S., and Ng S.W., Global nutrition transition and the pandemic of obesity in developing countries, Nutr. Rev. 70 (2012), pp. 3–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Popkin B.M., Du S., Zhai F., and Zhang B., Cohort profile: The China health and nutrition survey – monitoring and understanding socio-economic and health change in China, 1989–2011, Int. J. Epidemiol. 39 (2009), pp. 1435–1440. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Sargent J.D. and Blanchflower D.G., Obesity and stature in adolescence and earnings in young adulthood: Analysis of a british birth cohort, Arch. Pediatr. Adolesc. Med. 148 (1994), pp. 681–687. [DOI] [PubMed] [Google Scholar]
28.Schultz T.P., Wage rentals for reproducible human capital: Evidence from ghana and the ivory coast, Econ. Hum. Biol. 1 (2003), pp. 331–366. [DOI] [PubMed] [Google Scholar]
29.Shiller R.J., Smoothness priors and nonlinear regression, J. Am. Stat. Assoc. 79 (1984), pp. 609–615. [Google Scholar]
30.Shimokawa S., The labour market impact of body weight in China: A semiparametric analysis, Appl. Econ. 40 (2008), pp. 949–968. [Google Scholar]
31.Spiegelhalter D.J., Best N.G., Carlin B.P., and Linde A.V.D., Bayesian measures of model complexity and fit, J. Royal Stat. Soc. Ser. B. 64 (2002), pp. 583–639. [Google Scholar]
32.Thomas D. and Strauss J., Health and wages: Evidence on men and women in urban brazil, J. Econom. 77 (1997), pp. 159–185. [DOI] [PubMed] [Google Scholar]
33.Xu X., Mellor D., Kiehne M., Ricciardelli L.A., McCabe M.P., and Xu Y., Body dissatisfaction, engagement in body change behaviors and sociocultural influences on body image among Chinese adolescents, Body Image 7 (2010), pp. 156–164. [DOI] [PubMed] [Google Scholar]
34.Zhang X., Dagevos H., He Y., Van der Lans I., and Zhai F., Consumption and corpulence in China: A consumer segmentation study based on the food perspective, Food Policy 33 (2008), pp. 37–47. [Google Scholar]

[CIT0001] 1.Atella V., Pace N., and Vuri D., Are employers discriminating with respect to weight? European evidence using quantile regression, Econ. Hum. Biol. 6 (2008), pp. 305–329. [DOI] [PubMed] [Google Scholar]

[CIT0002] 2.Averett S.L. and Korenman S., The economic reality of the beauty myth, J. Hum. Resour. 31 (1996), pp. 304–330. [Google Scholar]

[CIT0003] 3.Baum C.L. and Ford W.F., The wage effects of obesity: A longitudinal study, Health. Econ. 13 (2004), pp. 885–899. [DOI] [PubMed] [Google Scholar]

[CIT0004] 4.Becker G.S. and Mulligan C.B., The endogenous determination of time preference, Q. J. Econ. 112 (1997), pp. 729–758. [Google Scholar]

[CIT0005] 5.Bell A.C., Ge K., and Popkin B.M., The road to obesity or the path to prevention: Motorized transportation and obesity in china, Obes. Res. 10 (2002), pp. 277–283. [DOI] [PubMed] [Google Scholar]

[CIT0006] 6.Bonnefond C. and Clément M., Social class and body weight among Chinese urban adults: The role of the middle classes in the nutrition transition, Soc. Sci. Med. 112 (2014), pp. 22–29. [DOI] [PubMed] [Google Scholar]

[CIT0007] 7.Cawley J., The impact of obesity on wages, J. Hum. Resour. 39 (2004), pp. 451–474. [Google Scholar]

[CIT0008] 8.Chen C. and Lu F., et al. The guidelines for prevention and control of overweight and obesity in Chinese adults, Biomed. Environ. Sci.: BES 17 (2004), pp. 1. [PubMed] [Google Scholar]

[CIT0009] 9.Chen H. and Jackson T., Prevalence and sociodemographic correlates of eating disorder endorsements among adolescents and young adults from China, Eur. Eat. Disord. Rev.: Prof. J. Eat. Disord. Assoc. 16 (2008), pp. 375–385. [DOI] [PubMed] [Google Scholar]

[CIT0010] 10.Everett M., Let an overweight person call on your best customers? Fat chance, Sales Mark. Manage. 142 (1990), pp. 66–70. [Google Scholar]

[CIT0011] 11.Fan J. and Gijbels I., Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66, Chapman and Hall/CRC press, 1996. [Google Scholar]

[CIT0012] 12.Feng L., Essays in Applied Microeconomics. Graduate Theses and Dissertations, 2019. Available at https://scholarcommons.usf.edu/etd/7784.

[CIT0013] 13.Glick P. and Sahn D.E., Health and productivity in a heterogeneous urban labour market, Appl. Econ. 30 (1998), pp. 203–216. [Google Scholar]

[CIT0014] 14.He W., Li Q., Yang M., Jiao J., Ma X., Zhou Y., Song A., Heymsfield S.B., Zhang S., and Zhu S., Lower BMI cutoffs to define overweight and obesity in China, Obesity 23 (2015), pp. 684–691. [DOI] [PubMed] [Google Scholar]

[CIT0015] 15.Huang C.C., Yabiku S.T., Ayers S.L., and Kronenfeld J.J., The obesity pay gap: Gender, body size, and wage inequalities – a longitudinal study of Chinese adults, 1991–2011, J. Popul. Res. 33 (2016), pp. 221–242. [Google Scholar]

[CIT0016] 16.Kline B. and Tobias J.L., The wages of BMI: Bayesian analysis of a skewed treatment–response model with nonparametric endogeneity, J. Appl. Econom. 23 (2008), pp. 767–793. [Google Scholar]

[CIT0017] 17.Koop G. and Poirier D.J., Bayesian variants of some classical semiparametric regression techniques, J. Econom. 123 (2004), pp. 259–282. [Google Scholar]

[CIT0018] 18.Koop G., Poirier D.J., and Tobias J.L., Bayesian Econometric Methods, Cambridge University Press, Cambridge, 2007. [Google Scholar]

[CIT0019] 19.Luo Y., Parish W.L., and Laumann E.O., A population-based study of body image concerns among urban Chinese adults, Body Image. 2 (2005), pp. 333–345. [DOI] [PubMed] [Google Scholar]

[CIT0020] 20.McLean R.A. and Moon M., Health, obesity, and earnings, Am. J. Public. Health. 70 (1980), pp. 1006–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0021] 21.Morris S., The impact of obesity on employment, Labour Econ. 14 (2007), pp. 413–433. [Google Scholar]

[CIT0022] 22.Pagan J.A. and Davila A., Obesity, occupational attainment, and earnings, Soc. Sci. Q. 78 (1997), pp. 756–770. [Google Scholar]

[CIT0023] 23.Pagan A. and Ullah A., Nonparametric Econometrics, Cambridge University Press, Cambridge, 1999. [Google Scholar]

[CIT0024] 24.Popkin B.M., The nutrition transition in the developing world, Dev. Policy. Rev. 21 (2003), pp. 581–597. [Google Scholar]

[CIT0025] 25.Popkin B.M., Adair L.S., and Ng S.W., Global nutrition transition and the pandemic of obesity in developing countries, Nutr. Rev. 70 (2012), pp. 3–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0026] 26.Popkin B.M., Du S., Zhai F., and Zhang B., Cohort profile: The China health and nutrition survey – monitoring and understanding socio-economic and health change in China, 1989–2011, Int. J. Epidemiol. 39 (2009), pp. 1435–1440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0027] 27.Sargent J.D. and Blanchflower D.G., Obesity and stature in adolescence and earnings in young adulthood: Analysis of a british birth cohort, Arch. Pediatr. Adolesc. Med. 148 (1994), pp. 681–687. [DOI] [PubMed] [Google Scholar]

[CIT0028] 28.Schultz T.P., Wage rentals for reproducible human capital: Evidence from ghana and the ivory coast, Econ. Hum. Biol. 1 (2003), pp. 331–366. [DOI] [PubMed] [Google Scholar]

[CIT0029] 29.Shiller R.J., Smoothness priors and nonlinear regression, J. Am. Stat. Assoc. 79 (1984), pp. 609–615. [Google Scholar]

[CIT0030] 30.Shimokawa S., The labour market impact of body weight in China: A semiparametric analysis, Appl. Econ. 40 (2008), pp. 949–968. [Google Scholar]

[CIT0031] 31.Spiegelhalter D.J., Best N.G., Carlin B.P., and Linde A.V.D., Bayesian measures of model complexity and fit, J. Royal Stat. Soc. Ser. B. 64 (2002), pp. 583–639. [Google Scholar]

[CIT0032] 32.Thomas D. and Strauss J., Health and wages: Evidence on men and women in urban brazil, J. Econom. 77 (1997), pp. 159–185. [DOI] [PubMed] [Google Scholar]

[CIT0033] 33.Xu X., Mellor D., Kiehne M., Ricciardelli L.A., McCabe M.P., and Xu Y., Body dissatisfaction, engagement in body change behaviors and sociocultural influences on body image among Chinese adolescents, Body Image 7 (2010), pp. 156–164. [DOI] [PubMed] [Google Scholar]

[CIT0034] 34.Zhang X., Dagevos H., He Y., Van der Lans I., and Zhai F., Consumption and corpulence in China: A consumer segmentation study based on the food perspective, Food Policy 33 (2008), pp. 37–47. [Google Scholar]

PERMALINK

Bayesian semiparametric analysis on the relationship between BMI and income for rural and urban workers in China

Lijuan Feng

Murat Munkin

ABSTRACT

1. Introduction

2. Econometric framework

2.1. The model

2.2. The priors

3. Simulation study

Figure 1.

Figure 2.

Table 1.

4. Data

4.1. Dependent variable and covariates

4.2. Summary statistics

Table 2.

5. Empirical results

5.1. Convergence

Figure 3.

5.2. Results for female samples

Figure 4.

Table 3.

5.3. Results for Male samples

Figure 5.

Table 4.

5.4. Results using wages

Figure 6.

Figure 7.

6. Conclusion

Acknowledgements

Appendices.

Appendix A. Computational Appendix – The MCMC Algorithm

Appendix B. Computer codes for the MCMC algorithm

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases