Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jan 1.
Published in final edited form as: Soc Sci Med. 2015 Dec 2;148:110–122. doi: 10.1016/j.socscimed.2015.11.028

Is Natural Experiment A Cure? Re-examing the Long-Term Health Effects of China’s 1959–1961 Famine

Hongwei Xu 1, Lydia Li 2, Zhenmei Zhang 3, Jinyu Liu 4
PMCID: PMC4698174  NIHMSID: NIHMS745194  PMID: 26692092

Abstract

The fetal origins hypothesis posits that adverse prenatal exposures, particularly malnutrition, increase the risk of poor adult health. Studies using famine as a natural experiment to test the fetal origins hypothesis present conflicting findings, partly because of data limitations and modeling flaws. Capitalizing on the biomarker data and prefecture-level geographic information from the 2011 China Health and Retirement Longitudinal Study, this study estimates the effects of prenatal exposure to China’s 1959–61 famine on later-life risks of cardiovascular and metabolic diseases. Our analysis addresses the problems of measurement error and intrinsic cohort differences that challenge prior studies. We use provincial and prefecture-level geographic variations in famine severity, a proxy for prenatal malnutrition, for model identification. We construct instrumental variables from geocoded newspaper archive data to adjust for measurement error in famine exposure. We find that estimates of the famine effects are highly sensitive to the choices of health indicators, measures of famine severity, and regression model specifications. Overall, we find little evidence supporting the fetal origins hypothesis. In fact, it appears that prenatal exposure to famine reduces later-life disease risks in certain cases. We interpret this finding as evidence of mortality selection among the famine survivors at work. We conclude that using famine as a natural experiment in itself does not guarantee correct statistical inference about the long-term health impacts of prenatal malnutrition when other analytical challenges remain unresolved.

Keywords: China, fetal origins hypothesis, famine, biomarker, cardiovascular and metabolic diseases, spatial

Introduction

The controversial fetal origins hypothesis (Barker, 1990, 1995a, b) conjectures that prenatal exposure to an adverse environment, in particular to malnutrition, may “program” the fetus to develop particular metabolic characteristics, likely through environmental effects on the epigenome. Such developmental changes may persist over the life course and increase risks of cardiovascular and metabolic diseases in middle and later ages. The life course perspective of the fetal origins hypothesis, as well as its recent variants such as the developmental origins hypothesis (Bateson et al., 2004; Gluckman et al., 2008; Gluckman et al., 2005), has made it an attractive theoretical and analytical framework to researchers. However, supportive empirical evidence is uneven, particularly in the area of the long-term health effects of prenatal malnutrition.

Early efforts to test the fetal origins hypothesis using observational data often failed to control for other prenatal confounders (Barker, 1995a, b; Barker & Osmond, 1986). These studies also rely heavily on low birth weight as a crude proxy for fetal malnutrition, even though maternal malnutrition during gestation may induce later-life disease without affecting birth weight (Roseboom et al., 2001). More importantly, the observed association between birth weight and health outcomes in later life could reflect many unobserved joint determinants such as genetic, socioeconomic, and environmental factors. Failing to control for such factors can introduce omitted variable bias and preclude causal inference (Almond & Currie, 2011; Paneth & Susser, 1995; Portrait et al., 2011; Song, 2013a).

To adjust for potential confounders more effectively, researchers have increasingly used exposure to prenatal famine as a natural experiment in long-term health effects (for a recent review, see Lumey et al., 2011). Well-known famine examples include 19th-century crop failures in Sweden and Finland, the Siege of Leningrad of 1941–44, the Dutch Hunger Winter of 1944–45, the Chinese Great Leap Forward famine of 1959–61, and the Bangladesh famine of 1974. Because exposure to famine is beyond the control of most individuals, regardless of their genetic traits, personality, or socioeconomic status, the process governing an individual’s prenatal exposure to famine-induced malnutrition is arguably exogenous and resembles random assignment. Therefore, casual effects of prenatal malnutrition on adult health can be inferred by comparing two similar subpopulations that differ in the famine exposure.

However, famine studies have produced conflicting evidence about the effects of prenatal exposure on adult health. For example, in comparisons of cohorts born before or after a famine, prenatal malnutrition has been related to higher older age mortality rates in the 1846–47 Dutch Potato Famine (Lindeboom et al., 2010), but not in the 1866–68 Finnish famine (Kannisto et al., 1997), the Dutch Hunger Winter (Painter et al., 2005), or the 1959–61 Chinese famine (Song, 2009). Even studies of the same famine have produced mixed findings on similar health outcomes. For example, in studies focused on the Siege of Leningrad, Koupil et al. (2007) reported significantly increased risks of cardiovascular diseases (CVD) and mortality in adult life among those exposed to famine in childhood, while Stanner et al. (1997) found no elevated risk for CVD among individuals exposed to famine either in utero or infancy. Similarly, in studies of the Dutch Hunger Winter, Ravelli et al. (1998) found an association between prenatal famine exposure and insulin resistance, but de Rooij et al. (2006) failed to do so. In their review of 30 studies of the relationship between prenatal famine and adult health, Lumey et al. (2011) found a lack of consistent associations for most measures of adult health, except for adult body size, diabetes, and schizophrenia.

These inconclusive findings may reflect incomparable analytic strategies among these studies. For instance, while the most common estimation approach is simple cohort difference (SCD) in health between individuals born during versus after/before a famine, some studies use an approach known as difference-in-differences (DID) to exploit spatial variation in famine exposure in addition to temporal (cohort) variation. The SCD approach hinges on stronger assumptions and presumably yields less robust results than the DID approach (see details below).

In this study, we seek to advance the literature by examining the effects of prenatal and infancy exposure to the 1959–61 Chinese famine, also known as the Great Leap Forward (GLF) famine, on adult cardiovascular and metabolic disease risks. We choose the GLF famine for a case study because of its greater magnitude relative to other famines in terms of duration (three years), geographic scope (pandemic as opposed to endemic), and level of damage (16.5–30 million excess deaths with a mortality rate of over 3.0%) (Song et al., 2009; Susser & St Clair, 2013). Our analysis tackles two empirical challenges of prior studies: (1) measurement errors related to famine exposure and adult health outcomes, and (2) inappropriate estimation strategies. We overcome the first challenge by drawing on biomarker data from the 2011 wave of the China Health and Retirement Longitudinal Study (CHARLS), matching CHARLS respondents to a variety of historical famine data, including fine-grained geographical variation in famine severity. We address the second challenge by assessing results obtained from four discrete estimation strategies – SCD, deviation from cohort trend (DCT), DID, and instrumental variable (IV) – which vary in their ability to address measurement error and to make causal inference. We provide a sketch of the GLF famine in the next section before reviewing the related literature.

Background of the GLF Famine

From 1958 to 1961, the Communist Party of China (CPC) launched a massive campaign, known as the Great Leap Forward (GLF), mobilizing the entire country to adopt radical economic and social policies to rapidly transform China from a predominantly agrarian society to an industrialized socialist economy through a Soviet-style high investment in heavy industry, supported by agricultural collectivization. To appeal to their zealous superiors in the CPC, and avoid being labeled “anti-revolution,” local cadres began to make fictitious high-yield agricultural reports of grain output to the People’s Daily, the CPC’s official newspaper (Kung & Chen, 2011). The first such instance was on June 8, 1958, when the front-page headline of the People’s Daily reported that a People’s Commune in Henan Province achieved a significantly higher than average wheat yield of 2,105 catties per mu (1 catty = 1/2 kilogram; 1 mu = 1/6 acre). This exaggeration was topped the next day when the People’s Daily reported that another commune in Hubei Province harvested an average of 2,357 catties of wheat per mu. Quickly, other regions throughout the country began to over-report grain yields. From June to September, more than 800 false reports (based on our calculation) of abnormally high grain yields were published in the People’s Daily. At the end of 1958, the national grain production was reported to be 375 million metric tons (MMT), roughly double the yield of 1957. Subsequent verification in 1961, however, placed the actual 1958 yield at 200 MMT (Bernstein, 1984).

These grain yield exaggerations led to food shortages in several ways. Top political leaders, believing China was facing a grain surplus rather than a shortfall (Bernstein, 1984; Yang, 1996), raised compulsory procurement levels –amounts that collectives must deliver to the state (Ashton et al., 1984; Bernstein, 1984). The total grain procurement in the 1958 grain year was 22.3 percent higher than that in 1957 (Yang, 1996). In addition, new policies were implemented to divert labor and resources from agriculture to fruitless projects such as the so-called backyard furnace movement in later 1958 and to reduce sown acreage in 1959 (Ashton et al., 1984). Together, these changes resulted in sharp declines in grain production, and rural villages suffered from severe food shortage after compulsory procurement to support urban and industrial growth. Coupled with other manmade and natural devastating factors, the resulting GLF famine of 1959–61 caused an estimated 16.5 to 30 million excess deaths, depending on the data sources, underlying assumptions, and methods of estimation employed (Ashton et al., 1984; Banister, 1987; Coale, 1984; Peng, 1987; Yao, 1999).

Studies of the GLF Famine and Limitations

In prior studies of the long-term effects of the GLF famine on physical health, the most frequently examined health outcome is adulthood nutritional status derived from anthropometric measures. A key data source in this research is the China Health and Nutrition Survey (CHNS), a longitudinal study of Chinese households in nine provinces since 1989. For example, using data from the 1991 wave of the CHNS, Chen and Zhou (2007) found that those born in 1959 and 1960 attained shorter height in adulthood than those born between 1963 and 1967 (i.e., after the famine). The negative association between prenatal famine exposure and adult height was also confirmed in the 1989 wave (Meng & Qian, 2009) and the pooled 1989–93 waves (Fung & Ha, 2010). However, using the 1989–97 CHNS data, Gørgens et al. (2012) reported that the young female famine cohort (born 1957–61) grew about 2 cm taller in adulthood than did the control cohorts (born 1938–47 or 1962–71) – a finding attributed to a positive selection among famine survivors. Findings on BMI-related measures are even more inconclusive. Studies comparing adults from the famine and post-famine birth cohorts report the famine cohort has no significant difference in BMI (Meng & Qian, 2009), borderline significantly higher (at 0.1 level) BMI (Fung & Ha, 2010), or significantly higher rates of ‘overweight’ BMI (>= 25 kg/m2) (Luo et al., 2006).

In other research, Huang et al. (2010b) analyzed a cross-sectional sample of women in three provinces and found reduced height in the famine cohort, increased BMI and risk of hypertension in the pre-famine cohort, compared to the post-famine cohort. In a series of studies using the biomarker data from the 2002 China National Nutrition and Health Survey (CNNHS), Li et al. (2010; 2011a; 2011b) reported that only in severely affected famine areas did the 1959–61 famine cohort have significantly higher systolic and diastolic blood pressures (BPs) and higher risks of hyperglycemia and metabolic syndrome than did the 1962–64 post-famine cohort.

Using retrospective data on family deaths from the 1988 National Survey of Fertility and Contraception, Song (2009) compared mortality between the famine and non-famine cohorts, finding no significant difference after adjusting for the cross-cohort temporal trend of mortality rate. In another study using the same data, Song (2010) found a higher mortality rate among the famine cohort up to ages 11 and 12, after which the non-famine cohort exhibited a higher mortality rate up to age 22. But given their vulnerability to recall error, and without a vital registration system for confirmation, such retrospective household survey data on family member death dates can be unreliable.

Other GLF famine studies have relied on subjective health indicators. For example, Fan and Qian (2015) used data from the 2005 Chinese General Social Survey to create a composite measure of health based on multiple measures of self-rated health (e.g., overall health, physical functioning, disability, bodily pain, vitality, emotion, and mental health). They found significantly worse general health in women born during the famine than in those born after the famine.

Several limitations are notable in famine studies in China. First, only a handful have directly measured risks of cardiovascular and metabolic diseases—health outcomes that have clear physiological mechanisms related to prenatal malnutrition, and that are specified in the fetal origins hypothesis. Anthropometric measures (height, weight, and BMI) are indicative of nutritional status, but are not biomarkers of cardiovascular and metabolic diseases. Some GLF famine studies have analyzed hypertension, but the empirical findings vary, with a significant famine effect reported in one study (Li et al., 2011a) and a null effect in others (Huang et al., 2010b; Meng & Qian, 2009). To the best of our knowledge, the only studies that directly examine metabolic disease effects are by Li and colleagues (2010; 2011a; 2011b), who used biomarker data collected in the 2002 CNNHS. Unfortunately, the CNNHS data are not publicly available, preventing expansion or replication of their studies.

A second limitation of past studies is that many relied on comparisons between the famine cohort (born or conceived during the famine years) and pre- or post-famine cohorts (born before or after the famine years) to infer famine effects. This SCD approach cannot rule out intrinsic cohort differences due to factors other than famine, especially when the famine and non-famine cohorts are broadly defined on age (Almond et al., 2010; Chen & Zhou, 2007; Song, 2013a). A refined alternative is to examine deviations of health outcomes, due to famine exposure, from smooth cohort trends (Almond, 2006; Almond et al., 2010). This approach distinguishes famine-induced cohort difference from cohort difference due to other factors (e.g., economic growth over time), but requires additional model specification about the secular cohort trend. Other studies (Chen & Zhou, 2007; Fan & Qian, 2015; Huang et al., 2010b; Li et al., 2011b; Luo et al., 2006) exploit within-cohort geographic variations in famine severity, in addition to between-cohort differences, to calculate DID estimates of famine effects. The DID approach may be more robust than the SCD approach because it permits intrinsic cohort differences as long as such differences do not vary by severity of famine exposure (Song, 2013a).

However, the DID strategy suffers from several flaws in itself. First, because researchers generally collect salient data after the famine, famine severity is often approximated by a period measure of total famine-caused excess mortality, regardless of birth cohort, during the famine years (Almond et al., 2010; Chen & Zhou, 2007; Luo et al., 2006), or a cohort measure of famine-induced cohort size shrinkage (Huang et al., 2010a; Huang et al., 2010b; Meng & Qian, 2009). These proxies are susceptible to measurement error, which in turn can bias the estimate of true famine effect. In particular, estimates of famine severity can be seriously biased if other correlated factors that likely contributed to excess mortality or cohort size shrinkage – such as degraded medical or social services – are omitted in regression models (Tan et al., 2014). Second, the selection effect could dominate the health damaging effect of famine where high mortality levels reduced populations to only the fittest individuals. In fact, several studies reviewed above have recognized the fitness selection effect with respect to famine survivors in terms of such outcomes as height (Gørgens et al., 2012), BMI (Huang et al., 2010b), mortality (Song, 2010), and schizophrenia (Song et al., 2009).

In addition, most studies of the GLF famine using the DID approach have relied on variations in famine severity at the provincial level. When the study area covers only a few provinces, researchers must include cohorts born many years after the famine in the control group to make it large enough to analyze between-group variations (Chen & Zhou, 2007; Fan & Qian, 2015; Fung & Ha, 2010), thereby further contaminating the famine effect with intrinsic cohort effect. In addition, because provincial-level famine intensity may vary widely, even within the hardest-hit province, the assumption of within-province homogeneity can result in biased estimates of famine effects. Constrained by data confidentiality restrictions, however, only a handful of studies have attempted to investigate sub-provincial variations by matching survey respondents to county-level cohort size shrinkage (Huang et al., 2010b; Huang et al., 2013; Meng & Qian, 2009). Even these studies, however, have shortcomings. Meng and Qian (2009) treated county of residence at the time of survey as one’s birth county among respondents reporting no migration in the prior five years – a problematic assumption after domestic migration began to surge in the early 1990s. And Huang et al.’s (2010b; 2013) analyses were restricted to a limited number of counties (24 or 35) concentrated in three or four provinces, thereby broaching problems of insufficient regional variation.

In short, despite conventional assumptions about the long-term negative health effects of prenatal famine exposure, empirical evidence remains scarce given challenges regarding data limitations, measurement errors, and estimation drawbacks. In the next section, we describe our measurement and modeling strategies to address these challenges.

Data and Method

Data

This study draws on individual-level data from the 2011 baseline survey of the China Health and Retirement Longitudinal Study (CHARLS), a nationally representative longitudinal survey of adults aged 45 and older and their spouses, if available. CHARLS sampled 17,708 residents from 150 counties across 28 provinces in China, with a response rate of 80.5% (Zhao et al., 2014b), and collected anthropometric and physical performance measures from 78.9% of the sample (13,978 respondents) and fasting blood samples with valid test results from 60.0% (10,627). Details on sampling procedures, field operations, and blood collection and tests are described elsewhere (Zhao et al., 2014a; Zhao et al., 2014b).

The CHARLS data has several appealing features for our analysis. First, it consists of a nationally representative sample with sufficient provincial-level variations in famine severity, making findings generalizable. Second, both the survey and biomarker data are publicly available, allowing other researchers to replicate our study. Third, respondents’ places of residence at the time of survey are known at the prefecture-level, allowing us to match adult health to sub-provincial famine severity of birthplace for respondents who did not move from their birth prefecture. On average, each province consists of about 10 prefectures and each prefecture consists of about nine counties. Unfortunately, CHARLS has not released the birth prefecture for those who moved away.

To replicate existing studies as closely as possible, we adopted the most common cohort definitions: the famine cohort consists of those born during the famine period of 1959–61; the pre-famine cohort consists of those born 1956–58; and the post-famine cohort consists of those born 1962–1964. Some famine cohort members (born in the early 1959) were actually conceived in 1958 and thus did not experience prenatal famine exposure in full term, while post-famine cohort members (born in the early 1962) conceived in 1961 experienced some prenatal famine exposure. Additional analyses that narrowed the birth years of the famine cohort to 1960–1961 and of the post-famine cohort to 1963–1964 did not alter our main findings.

Our analytical sample is restricted to 4,812 pre-famine, famine, and post-famine cohort members who were born and lived in rural areas during childhood and hence were hit harder by the famine than their urban peers, who were protected through state-controlled food rationing. Among them, we dropped 141 (2.9%) pre-famine, 109 (2.3%) famine, and 191 (4.0%) post-famine cohort members whose birth prefecture was unknown because they had moved since birth. We further dropped 64 respondents with missing covariates. Chi-squared tests show no significant cohort difference in missing data on birth prefecture or the other covariates. Every remaining respondent had at least one valid biomarker and, to maximize statistical power, we allowed the analytical sample size to vary depending on the number of valid responses for each health outcome. As a result, the sample sizes ranged from 1,003 to 1,308 for the pre-famine cohort, 664 to 884 for the famine cohort, and 1,110 to 1,550 for the post-famine cohort.

Dependent Variables

To overcome the limitations of previous studies in assessing the cardiovascular and metabolic disease risks of famine exposure, we used the CHARLS biomarker data to construct nine dichotomous indicators of high disease risks in three domains: cardiovascular (diastolic BP, systolic BP, and resting pulse); dyslipidemia (HDL cholesterol, LDL cholesterol, total cholesterol, and triglyceride); and diabetes (glucose and HbA1c), using clinical cut-points (shown in Table 1; also see Zhao et al., 2014a).

Table 1.

Cohort-stratified distributions of chronic disease risks and control variables in rural Chinese: CHARLS-2011

Pre-famine
(1956–58)
Famine
(1959–61)
Post-famine
(1962–64;
reference)
High-risk cut-points % (N) % (N) % (N)
Biomarkers
Cardiovascular
  Diastolic BP >=90 mmHg 16.0 (1,179) 17.5 (793)* 14.3 (1,394)
  Systolic BP >=140 mmHg 25.0 (1,179)*** 21.8 (793)* 17.4 (1,394)
  Resting pulse >100 beats/minute 11.7 (1,308) 11.3 (884) 11.4 (1,550)
Dyslipidemia
  HDL cholesterol <40 mg/dL in men;
<50 mg/dL in women
39.8 (1,012) 42.3 (673) 41.8 (1,116)
  LDL cholesterol >160 mg/dL 11.1 (1,010)** 10.4 (671)** 6.9 (1,116)
  Total cholesterol >=240 mg/dL 11.9 (1,010)* 9.5 (673) 9.1 (1,116)
  Triglyceride >=150 mg/dL 15.5 (1,012) 17.2 (673) 15.7 (1,116)
Diabetes
  Glucose >=126 mg/dL 12.7 (1,009)* 12.9 (672)* 9.9 (1,114)
  HbA1c >=6.5% 4.3 (1,021) 4.1 (676) 3.6 (1,126)
Biomarkers & Self-reports
  Hypertensiona 34.9 (1,179)*** 32.3 (793)* 27.7 (1,394)
  Heart problemb 18.7 (1,308) 17.9 (884) 17.4 (1,550)
  Dyslipidemiac 52.8 (1,009) 55.1 (671) 52.2 (1,116)
  Diabetesd 15.7 (1,003) 16.4 (664)* 12.7 (1,110)
Anthropometries
  Height (cm; mean) 159.6 (1,167) 159.6 (780) 160.0 (1,387)
  BMI (mean) 23.5 (1,163)*** 24.0 (779) 24.2 (1,383)
  Overweight BMI >=25 31.6 (1,163)** 35.2 (779) 37.1 (1,383)
  Abdominal obesity Waist
circumference >90cm
in men; >80cm in
women
44.8 (1,173)** 48.7 (785) 50.4 (1,391)
Control variables
Male 52.3 (1,326)** 46.4 (887) 46.2 (1,565)
Birth quarter
  Quarter 1 28.9 (1,326)*** 27.9 (887)** 21.9 (1,565)
  Quarter 2 23.8 (1,326) 24.5 (887) 21.2 (1,565)
  Quarter 3 23.5 (1,326)* 22.3 (887)** 27.1 (1,565)
  Quarter 4 23.8 (1,326)*** 25.4 (887)* 29.8 (1,565)
a

High diastolic or systolic blood pressure, doctor-diagnosed or on treatment for hypertension.

b

High resting pulse rate, doctor-diagnosed or on treatment for any heart problem.

c

Low HDL cholesterol, high LDL cholesterol, high total cholesterol, high triglycerides, doctor-diagnosed or on treatment for dyslipidemia.

d

High glucose, high HbA1c, doctor-diagnosed or on treatment for diabetes.

p < 0.1;

*

p < 0.05;

**

p < 0.01;

***

p < 0.001 for pairwise cohort differences using the post-famine cohort as the reference.

In addition, we combined these nine biomarkers with respondents’ self-reported disease history to create another four dichotomous disease indicators: hypertension (high-risk diastolic or systolic BP, or ever diagnosed with or treated for hypertension); heart problem (high-risk resting pulse, or ever diagnosed with or treated for any heart problem); dyslipidemia (high-risk HDL/LDL/total cholesterol or triglyceride levels, or diagnosed with or treated for dyslipidemia); and diabetes (high-risk glucose or HbA1c, or ever diagnosed with or treated for dyslipidemia). As a comparison to prior studies, we also considered several anthropometric measures typically used in the literature, including height, BMI, overweight (BMI >= 25 kg/m2), and abdominal obesity (waist circumference > 90 cm in men or > 80 cm in women).

Famine Severity

We employed two measures of famine severity from the literature on the GLF famine. The first is a period measure of total famine-caused excess deaths, regardless of cohort, during the famine years (Almond et al., 2010; Chen & Zhou, 2007; Fan & Qian, 2015; Luo et al., 2006). The excess death rate (EDR) was calculated as the difference between mortality in famine years (1959–61) and the average of death rates in the three years before the famine (1956–58). Following some previous studies (Chen & Zhou, 2007; Fan & Qian, 2015; Huang et al., 2013; Luo et al., 2006), we used the provincial-level EDR constructed by Lin and Yang (2000). No reliability statistics of the EDR was reported by Lin and Yang, but using the widely adopted EDR allows us to compare our results against previous studies that used the same measure, and to explore the origins of potential discrepancies. However, this measure suffers two shortcomings. First, the death rates are published by China’s State Statistical Bureau and hence subject to potential data distortion. Second, they are provincial mortality rates, which make sub-provincial analyses impossible.

As an alternative, we derived prefecture-level cohort size shrinkage indices (CSSI) from the publicly available 1% sample of the 1990 China Population Census (https://international.ipums.org/international/). Let Nnonfaminei denote the average cohort size of those born during the three years preceding the famine (1956–58) and the three years after the famine (1962–64) in the ith prefecture, and Nfaminei denote the average cohort size of those born during the three famine years (1959–61). The CSSI for the ith prefecture is calculated as a ratio:

CSSIi=NnonfamineiNfamineiNnonfaminei (1)

where a larger value indicates a greater reduction in cohort size due to reduced fertility and increased infant mortality, both presumably induced by the GLF famine (Huang et al., 2010a; Huang et al., 2010b; Meng & Qian, 2009).We matched provincial EDRs and prefecture-level CSSIs to the CHARLS respondents based on their self-reported birth prefecture. Both EDR and CSSI can more or less capture famine severity under the assumptions of accurate census data on fertility and mortality, stable secular trends in fertility and mortality in the counterfactual absence of the famine, and strictly restricted migration. When one or more assumptions are violated, measurement error will propagate and the inference of famine effect will be biased.

Instrumental Variables

As mentioned above, it is very hard to accurately measure individual-level famine exposure experienced decades ago. Instead, we have to rely on proxies of famine severity such as EDR and CSSI at certain aggregate levels (e.g., provinces and prefectures) and assume homogeneous famine exposure among individuals from the same area. To address the challenges of measuring famine exposure, we adopted an IV approach. An IV is an exogenous variable that correlates with the endogenous independent variable (i.e., famine severity) but not with the error term (i.e., the IV affects the dependent variable only indirectly through its effect on the endogenous independent variable being instrumented). A valid IV approach allows consistent estimation even in the presence of measurement error in the treatment variable (i.e., famine exposure) by effectively mimicking random assignments of respondents into the treatment and control groups in cross-sectional data (Angrist & Krueger, 2001).

Our IVs were constructed from newspaper reports of exaggerations of grain yields by county officials in each prefecture as published in the 1958 archives of the People’s Daily. As described above, local cadres demonstrated their loyalty to the central state through enthusiastic endorsement of the GLF movement in various forms, one of which was to falsely claim unprecedentedly high grain yields resulting from the agricultural measures advocated in the movement such as deep plowing, intensive seeding, and heavy fertilizing. Political loyalty was rewarded with career advancement and associated increases in salary, occupation prestige, authority, and privileged access to bureaucratically controlled goods (Goldstein, 1991). Unfortunately, the falsified harvest led to excessive compulsory grain procurement, reduced sown acreage in 1959, and diverted labor and resources from agriculture to fruitless projects (Ashton et al., 1984), all of which contributed to the subsequent famine. In other words, we should expect the frequency of exaggerating grain yields in a given prefecture to be positively related to famine severity in that prefecture, satisfying the IV relevance requirement. On the other hand, local cadres gradually stopped falsifying grain yields towards the end of 1958 as the harvest season ended. This devastating practice had been completely abandoned, along with other radical measures, by 1961 when the central state suspended the GLF movement as a whole. Therefore, it is unlikely that exaggerations of grain yields in 1958 could directly affect later-life health of the famine cohort other than through its impact on famine severity, thereby satisfying the exclusion restriction for an IV.

We geocoded to county level a total of 558 exaggerations of grain yields, defined as a reported grain yield of 1,000 catties per mu (or 3,000 kilograms per acre) or more (Kung & Chen, 2011), as reported in the People’s Daily from June to September, 1958 –the most intensive period of agricultural falsification. Due to the lack of county-level geographic information in the CHARLS data, the two IVs employed in this study are aggregated measures at the prefecture level. The first IV is the total number of exaggerations summed over all the counties in each prefecture, reflecting the famine severity. The second IV is the total number of counties that exaggerated grain yields in each prefecture, capturing the geographic coverage of agricultural falsification. These two IVs together form a measure of famine intensity and coverage across prefectures.

Control Variables

We controlled for gender and birth quarter (to adjust for season effect of birth on mortality, see for example Almond, 2006), as well as provincial fixed effects whenever suitable (i.e., model convergence is not a problem).

Statistical Models

We employed four estimators to infer the health effects of the GLF famine. The SCD estimator ignores geographic variations in famine severity and infers the famine effects from between-cohort differences in health outcomes. To simplify the discussion, let’s ignore the pre-famine cohort for now and define C = 1 for the famine cohort and C = 0 for the post-famine cohort. Let yidenote a health outcome for ith individual, a SCD estimator can be obtained from:

yi=β0+β1Ci+β2Xi+εi (2)

where β1 represents the famine effect, Xi denotes the control variables, and εi is a random error. The DCT estimator adds a polynomial cohort trend:

yi=β0+β1Ci+β2YOBi+β3YOBi2+β4YOBi3+β5Xi+εi (3)

where YOB denotes birth year and we follow Almond et al.’s (2010) work to allow a flexible, cubic cohort trend. Under the assumption that cohort effects on health tend to be smooth, deviations of health outcomes from the cohort trend for the famine cohort, as reflected by β1, indicate famine effects (Almond, 2006).

The DID estimator exploits geographic variations in famine severity in addition to cohort variations. Let Sj denote famine severity in jth prefecture (or province), a regression-based DID estimator can be obtained from:

yij=β0+β1Cij+β2Sj+β3Cij×Sj+β4Xij+εi (4)

where β3, the coefficient of the interaction between cohort and regional famine severity, is the DID estimate of the famine effect..

Unlike the SCD, DCT, and DID, the IV estimator does not hinge on between-cohort variations. It also alleviates the problems of measurement error by instrumenting the treatment variable - famine severity. Let Z1j and Z2j denote the two IVs in jth prefecture, a two-stage least squares (2SLS) IV estimator can be obtained from two equations:

yi=β0+β1Sj+β2Xij+εi (5)
Sj=α0+α1Z1j+α2Z2j+uj (6)

These estimators require different assumptions to achieve consistent estimates of the famine effects. The SCD approach assumes no other cohort differences in health exist except the cohort difference in exposure to the famine. This assumption may not hold when there are improvements in nutrition and health across cohorts due to economic growth over time. The DCT approach adjusts for intrinsic cohort differences due to non-famine factors by fitting a nonlinear cohort trend, but the chosen functional form can be misspecified. The DID approach does not require an explicit specification of the cohort trend, but it assumes the intrinsic cohort differences to be the same across areas of different levels of famine severity. It also implicitly assumes that variations in famine exposure are accurately approximated by the measured famine severity (i.e., Sj). The IV approach addresses the measurement error in famine exposure, but it requires the assumptions of instrument relevance and exclusion restriction to be satisfied. Throughout the regression analysis, we calculated p-values based on robust standard errors that adjust for the potential correlation of observations clustered within the same prefectures.

Results

Descriptive Statistics

Table 1 reports cohort-specific prevalence of disease risks and nutritional status, as well as Chi-square and t-tests of pairwise cohort differences using the post-famine cohort as the reference group. Comparisons between the famine and post-famine cohorts on biomarkers alone and on biomarkers combined with self-reports are consistent with the fetal origins hypothesis. The famine cohort had significantly higher proportions of at-risk members with respect to diastolic and systolic BP, LDL cholesterol, glucose, and combined indicators of hypertension and diabetes compared to the post-famine cohort. Similar patterns hold for the pre-famine cohort in comparison with the post-famine cohort. Comparisons on the anthropometric measures, however, yielded no significant differences between the famine and post-famine cohorts, and less risk for the pre-famine than the post-famine cohort in terms of the prevalence of high-BMI, overweight, and abdominally obese individuals.

Table 1 also shows frequency distributions of the control variables. The pre-famine cohort consisted of more male survivors (52.3%) than the famine (46.4%) and post-famine (46.2%) cohorts. In terms of season of birth, both the pre-famine and famine cohorts were more likely to be born in the first quarter and less likely to be born in the third and last quarters than the post-famine cohorts.

Table 2 presents descriptive statistics for measures of famine severity and IVs. At the provincial level, both excessive death rate (EDR) and cohort size shrinkage index (CSSI) exhibit sufficient spatial variations with large standard deviations and ranges. Figures 13 depict prefecture-level spatial variations in CSSI and the two IVs – number of grain yield exaggerations and number of counties with grain yield exaggerations – respectively. Figure 1 shows notable regional clusterings of high CSSI, corresponding to severe famine both within and across provincial boundaries. Mixed colors in any given province also suggest that within-province variation in CSSI was quite common even in the least (green) or hardest (red) hit provinces. The spatial distributions of the two IVs (Figures 2 and 3) overlap partially with that of the CSSI (Figure 1), implying that variation in the exaggeration of grain yields account for part of the variation in famine severity, and hence CSSI.

Table 2.

Measures of famine severity and instrumental variables during the 1959–1961 Chinese famine.

Mean SD Min Max N
Provincial Level
  EDR (unit: 0.1%) 6.4 6.9 0.1 28.6 27
  CSSI (unit: 1%) 37.1 12.6 19.0 63.0 27
Prefecture Level
  CSSI (unit: 1%) 40.8 13.8 9.9 79.1 119
  N of exaggerations 3.7 6.1 0.0 41.0 119
  N of counties exaggerating grain yields 1.8 2.1 0.0 14.0 119

Note: EDR = excess death rate and is from Lin and Yang (2000); CSSI = cohort size shrinkage index and is from the 1% sample of China’s 1990 population census.

Figure 1.

Figure 1

Prefecture-level cohort size shrinkage index (CSSI).

Figure 3.

Figure 3

Total number of exaggerations of grain yields in each prefecture (i.e., the second instrumental variable).

Figure 2.

Figure 2

Number of counties that ever exaggerated grain yields in each prefecture (i.e., the first instrumental variable).

Regression Results

To save space, Table 3 shows only the main coefficient estimates of interest – that is, the estimated famine effects on health as indexed by biomarkers (see Appendix Table A1 for examples of full model estimates). The SCD estimates provide evidence supporting the fetal origins hypothesis. Compared to the post-famine cohort, both pre-famine and famine cohorts were at significantly higher risk of poor health with respect to systolic BP, LDL cholesterol, and glucose. The pre-famine cohort also had higher total cholesterol. The DCT estimates reveal marginally significant risks of high resting pulse for both pre-famine and famine cohorts, but no difference in other outcomes. By contrast, the DID estimates are either insignificant or negatively significant (most notably for LDL and total cholesterol), regardless of how famine severity was measured (EDR or CSSI) or at what geographic scale (province or prefecture). In other words, the famine cohort living in areas of greater famine severity had similar or even lower levels of health risks in later life (as indicated by the biomarkers) than did the post-famine cohort living in areas less affected by the famine.

Table 3.

Estimated coefficients of famine effects on high-risk biomarkers.

Diastolic
BP
Systolic
BP
Resting
Pulse
Cholesterol
HDL LDL Total Triglyceride Glucose HbA1c
SCD (ref: post-famine)
  Pre-famine 0.074 0.279*** 0.043 −0.036 0.266** 0.144* −0.020 0.157* 0.077
  Famine 0.126 0.163* 0.003 −0.031 0.252** 0.024 0.055 0.175* 0.049
DCT (ref: post-famine)
  Pre-famine −0.039 −0.047 0.580 −0.059 0.094 0.035 0.089 −0.107 0.388
  Famine −0.082 −0.038 0.323 −0.108 0.079 −0.111 0.159 0.040 0.278
DID (ref: post-famine)
  Provincial EDR
    × Pre-famine −0.026* −0.012 −0.001 −0.013 −0.016 −0.019 −0.015 0.001 0.009
    × Famine −0.011 −0.021* −0.018 −0.006 −0.020 −0.023 −0.008 −0.014 0.001
  Provincial CSSI
    × Pre-famine −0.009 −0.004 −0.003 −0.001 −0.009 −0.012 0.001 0.007 0.009
    × Famine −0.005 −0.008 −0.007 0.003 −0.024** −0.018* 0.000 −0.003 −0.002
  Prefecture CSSI
    × Pre-famine −0.005 −0.003 −0.006 −0.004 −0.007 −0.010* −0.002 0.005 0.004
    × Famine −0.004 −0.004 −0.001 0.001 −0.022*** −0.019** −0.001 0.000 0.000
Famine cohort probit
  Provincial EDR −0.017 −0.022** −0.027* −0.017* −0.008 −0.012 −0.008 −0.006 −0.004
  Provincial CSSI −0.014** −0.013** −0.016* −0.008 −0.010 −0.010 −0.006 −0.004 −0.005
  Prefecture CSSI −0.012* −0.009 −0.008 −0.006 −0.010 −0.010 −0.005 −0.002 −0.001
Famine cohort 2SLS
  Prefecture CSSI 0.001 −0.004 0.003 0.006 −0.005 −0.004 0.000 0.000 0.000
  Diagnostic statistics
    First-stage F 16.74 16.74 19.61 15.56 14.98 15.56 15.56 15.49 12.67
    Stock-Yogo’s test Passed Passed Passed Passed Passed Passed Passed Passed Passed

Note: SCD = simple cohort difference; DCT = deviation from cohort trend; DID = difference-in-differences; 2SLS = two-stage least squares; EDR = excess death rate; CSSI = cohort size shrinkage index. All the models control for gender and birth quarters. The SCD, DCT, and DID models additionally control for provincial fixed effects.

p < 0.1;

*

p < 0.05;

**

p < 0.01;

***

p < 0.001 based on prefecture-level cluster standard errors.

A similar pattern that contradicts the fetal origins hypothesis persists when we relaxed the assumption about no intrinsic cohort difference by exploiting only the spatial variation in famine severity within the famine cohort. Famine severity, regardless of how it is measured, was negatively associated with the risks of having abnormal diastolic and systolic BP. Provincial-level famine severity, measured by EDR or CSSI, was also negatively associated with high resting pulse. Famine severity was also negatively associated with some measures of cholesterol: provincial-level EDR was significantly related to HDL cholesterol, and prefecture-level CSSI was marginally related to both LDL and total cholesterol.

Turning to the IV estimates, the first-stage F-statistic unanimously exceeds the rule of thumb value of 10, and the Stock-Yogo’s test of weak IVs is rejected for every biomarker (Staiger & Stock, 1997). The IV estimates showed a marginally significant, negative effect of prefecture-level famine severity on abnormally high LDL cholesterol, contrary to the fetal origins hypothesis.

As a sensitivity check, Table 4 reports the estimates of famine effects on cardiovascular and metabolic diseases measured by combinations of biomarkers and self-reports. A similar pattern emerges as that in Table 3. The SCD estimates provide strong evidence for the fetal origins hypothesis, since both the pre-famine and famine cohorts were at greater risks of hypertension and diabetes compared to the post-famine cohort. The DCT estimates also indicate elevated risks of heart problems in the pre-famine and famine cohorts. However, these significant relationships either disappeared or were reversed in direction in the DID estimates. One exception is that the pre-famine cohort living in areas more heavily affected by the famine remained at a marginally significantly higher risk of diabetes when famine severity was approximated by provincial-level CSSI. When we exploited spatial variations in famine exposure among the famine cohort alone, we again found evidence against the fetal origins hypothesis. Famine severity, measured by EDR or CSSI at provincial or prefecture-level, consistently reduced the risks of hypertension, heart problem, and dyslipidemia. The IV estimates show no significant famine effects.

Table 4.

Estimated coefficients of famine effects on combined disease risks from biomarkers, self-reports, and anthropometries.

Hypertension Heart
Problem
Dyslipidemia Diabetes Height BMI Overweight Abdominal
Obesity
SCD (ref: post-famine)
  Pre-famine 0.218*** 0.075 0.020 0.141* −1.332*** −0.542*** −0.125* −0.069
  Famine 0.128* 0.020 0.039 0.166* −0.727* −0.165 −0.060 −0.047
DCT (ref: post-famine)
  Pre-famine 0.013 0.525* −0.066 −0.166 −0.631 0.451 0.262 0.227
  Famine −0.036 0.261 −0.074 0.017 −0.300 0.191 0.096 0.047
DID (ref: post-famine)
Provincial EDR
  × Pre-famine −0.012 −0.003 −0.019** 0.007 −0.082** −0.040* −0.008 −0.015**
  × Famine −0.012 −0.023* −0.014 −0.015 0.007 −0.041* −0.017* 0.002
Provincial CSSI
  × Pre-famine −0.004 −0.002 −0.007 0.009 −0.038 −0.017 −0.002 −0.006
  × Famine −0.004 −0.011* −0.005 −0.005 0.014 −0.018 −0.009 0.007
Prefecture CSSI
  × Pre-famine −0.003 −0.005 −0.007 0.006 −0.028 −0.026* −0.004 −0.006
  × Famine −0.003 −0.004 −0.005 −0.002 0.013 −0.021 −0.008* 0.004
Famine cohort probit/OLS
  Provincial EDR −0.017* −0.029** −0.020** −0.012 −0.127** −0.019 −0.009 0.010
  Provincial CSSI −0.012* −0.022** −0.012** −0.007 −0.079* −0.019 −0.009 0.007
  Prefecture CSSI −0.010* −0.012* −0.010* −0.005 −0.057* −0.022 −0.009* 0.003
Famine cohort 2SLS
  Prefecture CSSI −0.004 −0.003 0.003 −0.002 −0.107 0.049 −0.002 0.015
  Diagnostic statistics
    First-stage F 16.98 19.91 15.12 15.36 16.67 16.89 16.89 16.77
    Stock-Yogo’s test Passed Passed Passed Passed Passed Passed Passed Passed

Note: SCD = simple cohort difference; DCT = deviation from cohort trend; DID = difference-in-differences; 2SLS = two-stage least squares; EDR = excess death rate; CSSI = cohort size shrinkage index. All the models control for gender and birth quarters. The SCD, DCT, and DID models additionally control for provincial fixed effects.

p < 0.1;

*

p < 0.05;

**

p < 0.01;

***

p < 0.001 based on prefecture-level cluster standard errors.

As another sensitivity check, we repeated the same analyses in Table 3 with additional controls for self-reported diagnoses and treatments for cardiovascular and metabolic diseases. Including these endogenous variables does not alter the general patterns described above, despite reduced significance levels for several coefficients (see Appendix Table A2).

To further replicate and compare with previous research, Table 4 also reports estimates for nutritional outcomes based on anthropometric measures. Overall, we observed the same results as before, with famine exposure having either no effect or seemingly health-promoting effects through reduced BMI and risks of overweight and abdominal obesity. One exceptional finding pertains to height. Famine exposure significantly undermined height attainment. This result persisted when we examined famine cohort only. Another exceptional finding is that the IV estimate indicates a marginally significant, positive effect of prefecture-level CSSI on the risk of abdominal obesity.

Lastly, the spatial clusterings of high CSSI (see Figure 1) suggest potential correlation of the error term across nearby prefectures. This may not be a serious problem in our study because ignoring such between-prefecture correlation may result in underestimated standard errors and hence inflated significance levels. Any statistical adjustment would then further attenuate rather than improve the existing evidence of the fetal origin hypothesis – a key argument in this study. Furthermore, the error term may not be spatially correlated across prefectures after controlling for CSSI and other covariates and calculating robust errors as we have done. Because a full spatial model is beyond the scope of this study, we investigated this possibility through a simple residual analysis. We calculated residuals from the models estimated above and regressed them on the longitudes and latitudes of prefecture centroids. We found only a limited number of significant coefficients (see Appendix Table A3), suggesting that between-prefecture correlation was not a serious problem.

Discussion

During the past few decades, health researchers have been enthusiastic in testing Barker’s fetal origins hypothesis (Barker, 1990, 1995a, b) because, if sustained empirically, it can contribute significantly to our knowledge about the long-term health consequences of early-life events (for reviews, see Bateson et al., 2004; Ben-Shlomo & Kuh, 2002; Gluckman et al., 2008). Evidence supporting the fetal origins hypothesis also has important public health implications in terms of identifying the right target (pregnant women), timing (prenatal), and strategy (improving maternal nutrition) for interventions to reduce health disparities in future generations. Therefore, recent empirical studies have used famines as natural experiments to address the endogeneity problems in inferring the causal health effects of prenatal malnutrition from observational data.

Our analysis addresses the problems of measurement error and intrinsic cohort difference that plague conventional studies. Through a systematic comparison, we show how estimates of the famine effects can be driven by the choice of analytical strategy. We conclude that using famine as a natural experiment does not eliminate the potential for making erroneous inferences when other analytical flaws are present.

The current study is innovative and contributes to the literature in at least two important ways. First, using China’s 1959–1961 famine as a natural experiment, we have conducted a comprehensive analysis of the impacts of prenatal malnutrition on later-life chronic disease risks by comparing estimation results from a variety of measurement and estimation strategies. Because we drew on publicly available, nationally representative data sources (the 2011 CHARLS, the 1% sample of the 1990 China Population Census, and the People’s Daily), our findings have greater generalizability than existing studies using regional samples (Chen & Zhou, 2007; Fung & Ha, 2010; Gørgens et al., 2012; Huang et al., 2010b; Huang et al., 2013; Luo et al., 2006; Meng & Qian, 2009), and can be replicated by other researchers. We also expanded health indicators from such anthropometric proxies as body weight and height to more direct and accurate measures using biomarkers that capture cardiovascular and metabolic functions. Contrary to the provincial homogeneity assumption often held in previous studies (Chen & Zhou, 2007; Fan & Qian, 2015; Fung & Ha, 2010), we illustrated considerable regional variations in famine severity at sub-provincial level (Table 2 and Figure 1). We exploited these finer geographic variations to obtain more accurate DID estimates than past studies focused solely on provincial variations. Last, we went beyond conventional between-cohort comparison and DID approaches by constructing IVs to adjust for measurement errors in famine exposure and focusing on spatial variations within the famine cohort only. These features of our analytical approach together ensure the robustness of our empirical findings.

Second, after comparing results obtained from a variety of analytic strategies, we concluded that evidence supporting the fetal origins hypothesis was weak. Only when using the standard SCD estimates (and in a few cases the DCT estimates) did we find significantly increased chronic disease risks among the pre-famine and famine cohorts. After purging constant cohort difference across regions that were affected by the famine to varying degrees, the DID estimates indicated either null or seemingly positive health effects (reduced chronic disease risks) in later life associated with prenatal famine exposure. Similar results persisted when we entirely discarded cohort comparison and instead leveraged fine-grained geographic variations in famine severity within the famine cohort. The IV estimates were generally not significant; and when they were borderline significant in a few cases, they suggested positive health effects of prenatal famine exposure. These results are robust against different measures of health.

We acknowledge that our estimates of long-term health effects of the GLF famine are confounded by differential mortality and interpret the null and negative findings as the evidence of selective mortality at work rather than the counterevidence against the fetal origins hypothesis per se. In other words, we do not believe that prenatal exposure to famine could “causally” decrease one’s health risks in later life. Instead, we concur with Song and colleagues that because fetuses, infants, children, and adults of poorer health endowment are more likely to be lost in the famine, the surviving famine cohort are likely to consist of the fittest individuals who are resilient to adverse environments (Song, 2009, 2010, 2013b; Song et al., 2009). Therefore, it is only possible to observe the negative long-term health effects of famine when famine survivors are not dominated by the selective frailty process. The positive estimates in this study may simply reflect that mortality selection outweighs the famine effects. On the other hand, our insignificant estimates suggests that either mortality selection offsets the famine effects, or that early-life exposure to the GLF famine indeed has no long-term health consequence. One appealing strategy to disentangle the long-term health effect of famine and the selection effect is to compare the health outcomes between the children of the pre-famine and famine cohorts and those of the post-famine cohort, the idea being that children inherit their parents’ genotype (the selection effect) but not their phenotype (the famine effect) (Gørgens et al., 2012). Unfortunately, CHARLS does not collect data on children’s cardiovascular and metabolic conditions.

We draw two important methodological implications for future research. First, estimates of famine effects can be highly sensitive to choices of analytic strategies and measures of health outcomes. For example, the typical SCD estimates show opposite effects to the other estimates in this study. The DID estimates turned out more significant for cholesterol-related biomarkers but less so after being combined with self-reported disease diagnosis; while the within-famine cohort estimates exhibited more significant results for measures related to hypertension but not diabetes. Thus, more comprehensive sensitivity analysis using multiple health indicators and modes is warranted in future research to avoid making biased inference.

Second, empirical tests of long-term health effects of prenatal exposure to famine are challenging, because mortality selection may dominate the health-detrimental famine effects in observational data. Although it remains methodologically appealing in terms of accounting for endogeneity, regarding famine as a natural experiment has drawbacks as demonstrated in this study.

Several study limitations are noteworthy. First, similar to many other famine studies, we do not have data to accurately measure individual-level prenatal exposure. Using ecological measures of famine severity at provincial or prefecture-level is a reasonable alternative, but may conceal important individual heterogeneity within the same region. Second, ecological measures such as EDR and CSSI are all estimated from government statistics that may be subject to falsification, although our IV approach helps to reduce measurement error. Third, like other studies of the GLF famine, we have (1) difficulty demarcating the timing of in utero famine exposure due to the lack of reliable and accurate vital statistics for this period (Susser & St Clair, 2013), and (2) a relatively small sample size after excluding the missing data.

Lastly, our IVs may be invalid if regional variation in exaggerations of grain yields was not related to regional variation in famine severity (violating the relevance assumption), or exaggerations of grain yields correlated with, for example, the CCP support which in turn affected subsequent recovery from the famine and population health (violating the exclusion restriction assumption). We have performed an additional analysis by constructing numbers of old revolutionary bases at the prefecture-level from government statistics (Department of Agriculture, 1989) as an indicator of regional political loyalty to the CPC (Kung & Lin, 2003), and found it only modestly correlated (in the range of 0.16–0.19,) with the two IVs (results not shown). Furthermore, massive political turmoil and economic disruption continued nationwide for nearly two decades after the GLF, severely restricting government efforts to restore public health resources. Therefore, it is possible that the IV assumptions still hold. Despite these limitations, overall, our study adds new empirical evidence and methodological insights to the controversial literature on the fetal origins hypothesis.

Highlights.

  • We examine the long-term health effects of prenatal famine exposure in China.

  • Four estimators which vary in their ability to make causal inference are compared.

  • Estimates of the famine effects are sensitive to measurement and modeling choices.

  • Overall, there is little evidence supporting the fetal origins hypothesis.

Appendix Table A1. Estimated coefficients of famine effects on high-risk systolic blood pressure

DID
SCD DCT Provincial EDR Provincial CSSI Prefecture CSSI
Male (ref: female) −0.021 −0.021 −0.020 −0.021 −0.021
Birth quarter (ref: quarter 1)
  Quarter 2 0.065 0.064 0.067 0.066 0.064
  Quarter 3 0.120 0.120 0.125 0.122 0.121
  Quarter 4 0.117 0.117 0.121* 0.120* 0.117
Birth year 0.084
Birth year2 −0.027
Birth year3 0.002
Cohort (ref: post-famine)
  Pre-famine 0.279*** −0.047 0.374*** 0.437 0.391
  Famine 0.163* −0.038 0.316** 0.477 0.340
Provincial EDR 0.033*
Provincial EDR × Pre-famine −0.012
Provincial EDR × Famine −0.021*
Provincial CSSI −0.081*
Provincial CSSI × Pre-famine −0.004
Provincial CSSI × Famine −0.008
Prefecture CSSI 0.003
Prefecture CSSI × Pre-famine −0.003
Prefecture CSSI × Pre-famine −0.004
Famine cohort probit
Famine cohort 2SLS
Provincial EDR Provincial CSSI Prefecture CSSI First stage Second stage
Male (ref: female) 0.026 0.020 0.021 −0.353 0.006
Birth quarter (ref: quarter 1)
  Quarter 2 0.099 0.106 0.107 1.166 0.032
  Quarter 3 0.139 0.138 0.132 1.167 0.039
  Quarter 4 0.084 0.092 0.078 0.131 0.023
Provincial EDR −0.022**
Provincial CSSI −0.013**
Prefecture CSSI −0.009 −0.004
N of exaggerations −0.112*
N of counties exaggerating
grain yields 1.646***

Note: SCD = simple cohort difference; DCT = deviation from cohort trend; DID = difference-in-differences; 2SLS = two-stage least squares; EDR = excess death rate; CSSI = cohort size shrinkage index. All the models control for gender and birth quarters. The SCD, DCT, and DID models additionally control for provincial fixed effects.

p < 0.1;

*

p < 0.05;

**

p < 0.01;

***

p < 0.001 based on prefecture-level cluster standard errors.

Appendix Table A2. Estimated coefficients of famine effects on high-risk biomarkers with additional controls for diagnosis and treatment for cardiovascular and metabolic diseases

Diastolic
BP
Systolic
BP
Resting
Pulse
Cholesterol
HDL LDL Total Triglyceride Glucose HbA1c
SCD (ref: post-famine)
  Pre-famine −0.004 0.217*** 0.038 −0.055 0.261** 0.140* −0.042 0.132* 0.038
  Famine 0.095 0.129 −0.001 −0.042 0.248** 0.021 0.046 0.158 0.036
DCT (ref: post-famine)
  Pre-famine −0.195 −0.170 0.577 −0.087 0.090 0.024 0.047 −0.169 0.272
  Famine −0.171 −0.111 0.323 −0.119 0.075 −0.118 0.141 0.006 0.238
DID (ref: post-famine)
  Provincial EDR
    × Pre-famine −0.024* −0.009 −0.001 −0.012 −0.015 −0.018 −0.015 0.002 0.011
    × Famine −0.006 −0.017 −0.018 −0.004 −0.019 −0.022 −0.005 −0.012 0.005
  Provincial CSSI
    × Pre-famine −0.008 −0.002 −0.003 0.000 −0.009 −0.011 0.000 0.008 0.011
    × Famine −0.003 −0.005 −0.007 0.004 −0.023** −0.018* 0.001 −0.003 −0.001
  Prefecture CSSI
    × Pre-famine −0.003 −0.001 −0.006 −0.003 −0.007 −0.010* −0.001 0.006 0.007
    × Famine −0.002 −0.002 −0.001 0.001 −0.021*** −0.019** 0.000 0.001 0.001
Famine cohort probit
  Provincial EDR −0.013 −0.018* −0.027* −0.016* −0.007 −0.012 −0.006 −0.004 −0.002
  Provincial CSSI −0.011* −0.009 −0.015* −0.007 −0.010 −0.010 −0.005 −0.003 −0.004
  Prefecture CSSI −0.010* −0.006 −0.008 −0.006 −0.009 −0.010 −0.004 −0.001 −0.001
Famine cohort 2SLS
  Prefecture CSSI 0.003 −0.001 0.003 0.007 −0.005 −0.004 0.001 0.001 0.000
  Diagnostic statistics
    First-stage F 14.26 14.26 16.65 13.07 12.6 13.07 13.07 13.00 10.63
    Stock-Yogo’s test Passed Passed Passed Passed Passed Passed Passed Passed Passed

Note: SCD = simple cohort difference; DCT = deviation from cohort trend; DID = difference-in-differences; 2SLS = two-stage least squares; EDR = excess death rate; CSSI = cohort size shrinkage index. All the models control for gender and birth quarters. The SCD, DCT, and DID models additionally control for provincial fixed effects.

p < 0.1;

*

p < 0.05;

**

p < 0.01;

***

p < 0.001 based on prefecture-level cluster standard errors.

Appendix Table A3. Coefficient estimates from regressing residuals of high-risk biomarkers on the geographic coordinates of prefecture centroids

Diastolic
BP
Systolic
BP
Resting
Pulse
Cholesterol
HDL LDL Total Triglyceride Glucose HbA1c
SCD
  Longitude −0.002 −0.005 0.006* −0.002 −0.003 −0.002 −0.003 −0.002 −0.002
  Latitude 0.001 0.001 0.003 0.004 −0.002 −0.001 0.004 0.000 0.000
DCT
  Longitude −0.001 −0.003 0.004 −0.001 −0.003 −0.002 −0.004 −0.003 0.000
  Latitude 0.001 0.000 0.002 0.005 −0.001 0.000 0.003 0.000 0.000
DID
  Provincial EDR
    Longitude −0.001 −0.004 0.007* −0.002 −0.002 −0.001 −0.002 −0.001 −0.002
    Latitude 0.001 0.001 0.003 0.004 −0.002 −0.001 0.004 0.000 0.000
  Provincial CSSI
    Longitude −0.001 −0.005 0.007* −0.002 −0.002 −0.002 −0.003 −0.002 −0.002
    Latitude 0.001 0.001 0.003 0.004 0.002 0.000 0.004 0.000 0.000
  Prefecture CSSI
    Longitude −0.002 −0.002 0.005 −0.002 −0.003 −0.002 0.001 0.000 0.000
    Latitude 0.006* 0.006 0.001 0.005 0.000 0.002 0.005 0.003 0.002
Famine cohort probit
  Provincial EDR
    Longitude 0.009 0.008 0.011 −0.003 0.009 0.006 −0.006 −0.007 −0.005
    Latitude 0.002 0.006 0.005 0.002 −0.011 −0.001 0.021** 0.029*** 0.008
  Provincial CSSI
    Longitude 0.008 0.007 0.012 −0.003 0.007 0.004 −0.005 −0.007 −0.006
    Latitude 0.003 0.006 0.002 −0.001 −0.006 0.001 0.020** 0.028*** 0.009
  Prefecture CSSI
    Longitude 0.004 −0.001 0.010 −0.005 0.008 0.007 −0.005 −0.002 0.002
    Latitude 0.007 0.008 −0.001 0.000 −0.006 −0.001 0.017** 0.010 0.004
Famine cohort 2SLS
  Longitude 0.002* −0.004** 0.005*** −0.001 −0.002 −0.001 0.002 0.001 0.001
  Latitude 0.000 0.004** −0.003** 0.000 0.000 0.000 −0.001 0.000 0.001

Note: SCD = simple cohort difference; DCT = deviation from cohort trend; DID = difference-in-differences; 2SLS = two-stage least squares; EDR = excess death rate; CSSI = cohort size shrinkage index. All the models control for gender and birth quarters. The SCD, DCT, and DID models additionally control for provincial fixed effects.

p < 0.1;

*

p < 0.05;

**

p < 0.01;

***

p < 0.001 based on prefecture-level cluster standard errors.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Hongwei Xu, Institute for Social Research, University of Michigan, 426 Thompson St, ISR 2459, Ann Arbor, MI 48104-2321, xuhongw@umich.edu, Phone: +1 (734) 615-3552, Fax: +1 (734) 763-1428.

Lydia Li, School of Social Work, University of Michigan.

Zhenmei Zhang, Department of Sociology, Michigan State University.

Jinyu Liu, School of Social Work, Columbia University.

References

  1. Almond D. Is the 1918 Influenza Pandemic Over? Long-Term Effects of In Utero Influenza Exposure in the Post-1940 U.S. Population. Journal of Political Economy. 2006;114:672–712. [Google Scholar]
  2. Almond D, Currie J. Killing Me Softly: The Fetal Origins Hypothesis. The Journal of Economic Perspectives. 2011;25:153–172. doi: 10.1257/jep.25.3.153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Almond D, Edlund L, Li H, Zhang J. Long-Term Effects of Early-Life Development: Evidence from the 1959 to 1961 China Famine. In: Ito T, Rose A, editors. The Economic Consequences of Demographic Change in East Asia. Chicago: University of Chicago Press; 2010. pp. 321–345. [Google Scholar]
  4. Angrist JD, Krueger AB. Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives. 2001;15:69–85. [Google Scholar]
  5. Ashton B, Hill K, Piazza A, Zeitz R. Famine in China, 1958–61. Population and Development Review. 1984;10:613–645. [Google Scholar]
  6. Banister J. China's Changing Population. Stanford, CA: Stanford University Press; 1987. [Google Scholar]
  7. Barker DJP. The Fetal And Infant Origins Of Adult Disease: The Womb May Be More Important Than The Home. BMJ: British Medical Journal. 1990;301:1111. doi: 10.1136/bmj.301.6761.1111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Barker DJP. The Fetal and Infant Origins of Disease. European Journal of Clinical Investigation. 1995a;25:457–463. doi: 10.1111/j.1365-2362.1995.tb01730.x. [DOI] [PubMed] [Google Scholar]
  9. Barker DJP. Fetal Origins Of Coronary Heart Disease. BMJ: British Medical Journal. 1995b;311:171–174. doi: 10.1136/bmj.311.6998.171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Barker DJP, Osmond C. Infant Mortality, Childhood Nutrition, and Ischaemic Heart Disease in England and Wales. The Lancet. 1986;327:1077–1081. doi: 10.1016/s0140-6736(86)91340-1. [DOI] [PubMed] [Google Scholar]
  11. Bateson P, Barker D, Clutton-Brock T, Deb D, D'Udine B, Foley RA, et al. Developmental Plasticity and Human Health. Nature. 2004;430:419–421. doi: 10.1038/nature02725. [DOI] [PubMed] [Google Scholar]
  12. Ben-Shlomo Y, Kuh D. A Life Course Approach to Chronic Disease Epidemiology: Conceptual Models, Empirical Challenges and Interdisciplinary Perspectives. International Journal of Epidemiology. 2002;31:285–293. [PubMed] [Google Scholar]
  13. Bernstein TP. Stalinism, Famine, and Chinese Peasants. Theory and Society. 1984;13:339–377. [Google Scholar]
  14. Chen Y, Zhou L-A. The Long-Term Health and Economic Consequences of the 1959–1961 Famine in China. Journal of Health Economics. 2007;26:659–681. doi: 10.1016/j.jhealeco.2006.12.006. [DOI] [PubMed] [Google Scholar]
  15. Coale AJ. Rapid Population Change in China, 1952–1982. Washington, D.C.: National Academy Press; 1984. [Google Scholar]
  16. de Rooij SR, Painter RC, Phillips DIW, Osmond C, Michels RPJ, Godsland IF, et al. Impaired Insulin Secretion After Prenatal Exposure to the Dutch Famine. Diabetes Care. 2006;29:1897–1901. doi: 10.2337/dc06-0460. [DOI] [PubMed] [Google Scholar]
  17. Department of Agriculture, P.s.R.o.C. Rural Economic Statistics of China: 1949–1986. Beijing: Agriculture Press; 1989. [Google Scholar]
  18. Fan W, Qian Y. Long-Term Health and Socioeconomic Consequences of Early-life Exposure to the 1959–1961 Chinese Famine. Social Science Research. 2015;49:53–69. doi: 10.1016/j.ssresearch.2014.07.007. [DOI] [PubMed] [Google Scholar]
  19. Fung W, Ha W. Intergenerational Effects of the 1959–61 China Famine. In: Fuentes-Nieva R, Seck PA, editors. Risk, Shocks and Human Development: On the Brink. London, UK: Palgrave-Macmillan; 2010. pp. 222–254. [Google Scholar]
  20. Gluckman PD, Hanson MA, Cooper C, Thornburg KL. Effect of In Utero and Early-Life Conditions on Adult Health and Disease. New England Journal of Medicine. 2008;359:61–73. doi: 10.1056/NEJMra0708473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Gluckman PD, Hanson MA, Spencer HG, Bateson P. Environmental Influences during Development and Their Later Consequences for Health and Disease: Implications for the Interpretation of Empirical Studies. Proceedings of the Royal Society B: Biological Sciences. 2005;272:671–677. doi: 10.1098/rspb.2004.3001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Goldstein A. From Bandwagon to Balance-of-Power Politics: Structural Constraints and Politics in China, 1949–1978. Stanford, CA: Stanford University Press; 1991. [Google Scholar]
  23. Gørgens T, Meng X, Vaithianathan R. Stunting and Selection Effects of Famine: A Case Study of the Great Chinese Famine. Journal of Development Economics. 2012;97:99–111. [Google Scholar]
  24. Huang C, Li Z, Venkat Narayan KM, Williamson DF, Martorell R. Bigger Babies Born to Women Survivors of the 1959–1961 Chinese Famine: A Puzzle due to Survival Selection? Journal of Developmental Origins of Health and Disease. 2010a;1:412–418. doi: 10.1017/S2040174410000504. [DOI] [PubMed] [Google Scholar]
  25. Huang C, Li Z, Wang M, Martorell R. Early Life Exposure to the 1959- 1961 Chinese Famine Has Long-Term Health Consequences. The Journal of Nutrition. 2010b;140:1874–1878. doi: 10.3945/jn.110.121293. [DOI] [PubMed] [Google Scholar]
  26. Huang C, Phillips MR, Zhang Y, Zhang J, Shi Q, Song Z, et al. Malnutrition in Early Life and Adult Mental Health: Evidence from a Natural Experiment. Social Science & Medicine. 2013;97:259–266. doi: 10.1016/j.socscimed.2012.09.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kannisto V, Christensen K, Vaupel JW. No Increased Mortality in Later Life for Cohorts Bom during Famine. American Journal of Epidemiology. 1997;145:987–994. doi: 10.1093/oxfordjournals.aje.a009067. [DOI] [PubMed] [Google Scholar]
  28. Koupil I, Shestov D, Sparén P, Plavinskaja S, Parfenova N, Vågerö D. Blood Pressure, Hypertension and Mortality from Circulatory Disease in Men and Women Who Survived the Siege of Leningrad. European Journal of Epidemiology. 2007;22:223–234. doi: 10.1007/s10654-007-9113-6. [DOI] [PubMed] [Google Scholar]
  29. Kung JK-s, Chen S. The Tragedy of the Nomenklatura: Career Incentives and Political Radicalism during China’s Great Leap Famine. American Political Science Review. 2011;105:27–45. [Google Scholar]
  30. Kung JK-s, Lin Justin Y. The Causes of China’s Great Leap Famine, 1959–1961. Economic Development and Cultural Change. 2003;52:51–73. [Google Scholar]
  31. Li Y, He Y, Qi L, Jaddoe VW, Feskens EJM, Yang X, et al. Exposure to the Chinese Famine in Early Life and the Risk of Hyperglycemia and Type 2 Diabetes in Adulthood. Diabetes. 2010;59:2400–2406. doi: 10.2337/db10-0385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Li Y, Jaddoe VW, Qi L, He Y, Lai J, Wang J, et al. Exposure to the Chinese Famine in Early Life and the Risk of Hypertension in Adulthood. Journal of Hypertension. 2011a;29:1085–1092. doi: 10.1097/HJH.0b013e328345d969. [DOI] [PubMed] [Google Scholar]
  33. Li Y, Jaddoe VW, Qi L, He Y, Wang D, Lai J, et al. Exposure to the Chinese Famine in Early Life and the Risk of Metabolic Syndrome in Adulthood. Diabetes Care. 2011b;34:1014–1018. doi: 10.2337/dc10-2039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Lin JY, Yang DT. Food Availability, Entitlements and the Chinese Famine of 1959–61. The Economic Journal. 2000;110:136–158. [Google Scholar]
  35. Lindeboom M, Portrait F, van den Berg GJ. Long-Run Effects on Longevity of a Nutritional Shock Early in Life: The Dutch Potato Famine of 1846–1847. Journal of Health Economics. 2010;29:617–629. doi: 10.1016/j.jhealeco.2010.06.001. [DOI] [PubMed] [Google Scholar]
  36. Lumey LH, Stein AD, Susser E. Prenatal Famine and Adult Health. Annual Review of Public Health. 2011;32:237–262. doi: 10.1146/annurev-publhealth-031210-101230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Luo Z, Mu R, Zhang X. Famine and Overweight in China. Applied Economic Perspectives and Policy. 2006;28:296–304. [Google Scholar]
  38. Meng X, Qian N. National Bureau of Economic Research Working Paper Series, No. 14917. 2009. The Long Term Consequences of Famine on Survivors: Evidence from a Unique Natural Experiment using China's Great Famine. [Google Scholar]
  39. Painter RC, Roseboom TJ, Bleker OP. Prenatal Exposure to the Dutch Famine and Disease in Later Life: An Overview. Reproductive Toxicology. 2005;20:345–352. doi: 10.1016/j.reprotox.2005.04.005. [DOI] [PubMed] [Google Scholar]
  40. Paneth N, Susser M. Early Origins of Coronary Heart Disease: The Barker Hypothesis. BMJ. 1995;310:411–412. doi: 10.1136/bmj.310.6977.411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Peng X. Demographic Consequences of the Great Leap Forward in China's Provinces. Population and Development Review. 1987;13:639–670. [Google Scholar]
  42. Portrait F, Teeuwiszen E, Deeg D. Early Life Undernutrition and Chronic Diseases at Older Ages: The Effects of the Dutch Famine on Cardiovascular Diseases and Diabetes. Social Science & Medicine. 2011;73:711–718. doi: 10.1016/j.socscimed.2011.04.005. [DOI] [PubMed] [Google Scholar]
  43. Ravelli ACJ, van der Meulen JHP, Michels RPJ, Osmond C, Barker DJP, Hales CN, et al. Glucose Tolerance in Adults after Prenatal Exposure to Famine. The Lancet. 1998;351:173–177. doi: 10.1016/s0140-6736(97)07244-9. [DOI] [PubMed] [Google Scholar]
  44. Roseboom TJ, van der Meulen JHP, Ravelli ACJ, Osmond C, Barker DJP, Bleker OP. Effects of Prenatal Exposure to the Dutch Famine on Adult Disease in Later Life: An Overview. Molecular and Cellular Endocrinology. 2001;185:93–98. doi: 10.1016/s0303-7207(01)00721-3. [DOI] [PubMed] [Google Scholar]
  45. Song S. Does Famine Have a Long-term Effect on Cohort Mortality? Evidence from the 1959–1961 Great Leap Forward Famine in China. Journal of Biosocial Science. 2009;41:469–491. doi: 10.1017/S0021932009003332. [DOI] [PubMed] [Google Scholar]
  46. Song S. Mortality Consequences of the 1959–1961 Great Leap Forward Famine in China: Debilitation, Selection, and Mortality Crossovers. Social Science & Medicine. 2010;71:551–558. doi: 10.1016/j.socscimed.2010.04.034. [DOI] [PubMed] [Google Scholar]
  47. Song S. Identifying the Intergenerational Effects of the 1959–1961 Chinese Great Leap Forward Famine on Infant Mortality. Economics & Human Biology. 2013a;11:474–487. doi: 10.1016/j.ehb.2013.08.001. [DOI] [PubMed] [Google Scholar]
  48. Song S. Prenatal Malnutrition and Subsequent Foetal Loss Risk: Evidence from the 1959–1961 Chinese Famine. Demographic Research. 2013b;29:707–727. [Google Scholar]
  49. Song S, Wang W, Hu P. Famine, Death, and Madness: Schizophrenia in Early Adulthood after Prenatal Exposure to the Chinese Great Leap Forward Famine. Social Science & Medicine. 2009;68:1315–1321. doi: 10.1016/j.socscimed.2009.01.027. [DOI] [PubMed] [Google Scholar]
  50. Staiger D, Stock JH. Instrumental Variables Regression with Weak Instruments. Econometrica. 1997;65:557–586. [Google Scholar]
  51. Stanner SA, Bulmer K, Andrès C, Lantseva OE, Borodina V, Poteen VV, et al. Does Malnutrition in Utero Determine Diabetes and Coronary Heart Disease in Adulthood? Results from the Leningrad Siege Study, a Cross Sectional Study. BMJ: British Medical Journal. 1997;315:1342–1348. doi: 10.1136/bmj.315.7119.1342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Susser E, St Clair D. Prenatal Famine and Adult Mental Illness: Interpreting Concordant and Discordant Results from the Dutch and Chinese Famines. Social Science & Medicine. 2013;97:325–330. doi: 10.1016/j.socscimed.2013.02.049. [DOI] [PubMed] [Google Scholar]
  53. Tan CM, Tan Z, Zhang X. Sins of the Father: The Intergenerational Legacy of the 1959–61 Great Chinese Famine on Children's Cognitive Development. The Rimini Center for Economic Analysis Working Papers; Rimini, Italy. 2014. [Google Scholar]
  54. Yang DL. Calamity and Reform in China: State, Rural Society, and Institutional Change Since the Great Leap Famine. Stanford, CA: Stanford University Press; 1996. [Google Scholar]
  55. Yao S. A Note on the Causal Factors of China's Famine in 1959–1961. Journal of Political Economy. 1999;107:1365–1369. [Google Scholar]
  56. Zhao Y, Crimmins E, Hu P, Hu Y, Ge T, Kim JK, et al. China Health and Retirement Longitudinal Study: 2011–2012 National Baseline Blood Date Users' Guide. Beijing: China Center for Economic Research, Peking University; 2014a. [Google Scholar]
  57. Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort Profile: The China Health and Retirement Longitudinal Study (CHARLS) International Journal of Epidemiology. 2014b;43:61–68. doi: 10.1093/ije/dys203. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES