Abstract
Survey measures of household wealth often incorporate measurement error. The resulting excess variability in the first difference in wealth makes meaningful statistical inference difficult on changes in household-level wealth. We study the effects of two methods intended to reduce this problem: Asset verification confronts respondents with large discrepancies between wealth reports from the current wave and from the previous wave. Cross-wave imputation uses adjacent wave information in the imputation procedures for missing data. In the U.S. Health and Retirement Study, the corrections from asset verification substantially reduced wave-to-wave changes in wealth. The cross-wave imputations also reduced variation, but to a lesser extent.
Keywords: Wealth measurement, imputation, panel data, survey design
1. Introduction
The accurate measurement of household wealth levels and wealth changes is essential for understanding household financial behavior and related topics such as consumption and retirement. Therefore, many household surveys contain a module eliciting wealth. Survey measures of wealth are subject to measurement error and missing data, which may bias results (e.g., [1], Section 2.3; [2, p. 43]). Even though total wealth is often the variable of interest in empirical studies, the elicitation of total wealth involves the elicitation of the values of components of the total with the implication that observation errors on the components will carry over to the total. Substantial effort has been spent on developing improvements in questionnaire design that greatly reduce measurement error in wealth and other variables such as income (e.g., [3]). Many of these efforts have focused on improving the level of measured wealth. However, as shown by Juster et al. [4], even if measurement of the level of household wealth is satisfactory for cross-sectional analyses, further improvement may be required if the goal is to use the wave-to-wave changes in wealth to study wealth accumulation (including saving) and decumulation, because the effects of measurement error in levels on empirical estimation of economic models are exacerbated by taking first differences (see, e.g., [1, pp. 138–146], and the references therein). Even if the model of interest is a model in levels, it may be estimated in first differences to eliminate a fixed effect (unobserved individual heterogeneity).
In this paper, we examine two methods that are designed to reduce the wave-to-wave excess variability in wealth due to measurement error: Asset verification flags large changes in wealth components during the interview, and asks respondents to confirm or correct the values. Thus, whenever a respondent reports a value of an asset that differs greatly from the value reported in the previous wave, the respondent is confronted with both values and asked whether they are correct, and if not, what their values should be. This is aimed at correcting typos, mis-hearings, and other errors that lead to large measurement errors in individual cases.
Cross-wave imputations use the values of the wealth components in adjacent waves as additional covariates in imputation models for missing data. This is based on the premise that there is a strong serial correlation in wealth, and thus values of assets in adjacent waves have additional explanatory power beyond contemporaneous covariates.
We study these two improvements in the context of the U.S. Health and Retirement Study (HRS) [5,6].1 The HRS is primarily a panel survey; it is the leading source of information in the United States on retirement, health, and the economic, personal, and social situation of individuals over the age of 50. The HRS has been used in numerous studies of labor supply, income, assets, consumption, health, medical expenditures, cognitive decline, and other topics. In many of these studies, wealth is either an outcome or a regressor. Its importance extends internationally where it has been a model for sister studies of the older populations in England, Europe, Japan, China, Korea, Mexico, India and others.
The HRS measures wealth in about 20 categories. The HRS has been at the forefront of methodological innovations in economic measurement (e.g., [3]) and correspondingly, the quality of its economic data is highly regarded. One innovation is the use of follow-up questions after an initial nonresponse, through a sequence of unfolding brackets. Unfolding brackets lead a respondent through a series of questions to place an asset value into an interval (see Appendix B for an example), which, because asset values are highly skewed, helps considerably in imputation for missing values.
Hill [7] and Venti [8] studied the distribution of wealth changes in the HRS and, despite the questionnaire innovations, they found evidence that measurement error may disproportionately affect these distributions. Hill [7] reported the results of an experiment in the HRS (performed in 2001) that involved a call-back of households who had large changes in reported asset values (between 1998 and 2000) and asked them to confirm or correct these values. Incorporating the corrections from this call-back led to a drop in the variance of the change in net worth of about 50%. Venti [8] studied changes in several measures, in particular IRA assets of relevance here. He showed some examples of unlikely ownership changes in IRAs and large swings in their balances. He pointed out that households rarely cash out IRAs, and that changes back to owning an IRA should be even rarer, yet such cases do occur in the HRS. Venti reported that 25% of households have a such gap in ownership (i.e., having an IRA, then not having an IRA, then having an IRA again). As to swings in balances, he showed some examples where in three out of four consecutive waves the balances are of the same order of magnitude, but in one of the middle waves the balance is wildly different, or there is some trend from which one of the middle waves deviates sharply. He inferred that about 30–40% of households show some pattern of IRA balances that involves a potential misreport. Venti made several recommendations, among others to incorporate the asset verification corrections and to create longitudinal imputations. These recommendations are the subject of this paper.
Because of Hill’s [7] findings, the HRS questionnaire has included the asset reconciliation (or asset verification) section, section U, since the 2002 wave. Similarly to the earlier call-back experiment, this section is administered in case of large deviations of recorded asset values compared to previous wave asset values, provided the same person answered the asset section of the survey in both waves. In this paper we provide details on how HRS implemented this asset verification and to what extent it helped reduce measurement error in recorded asset values, with special attention on cross-wave variability.
Like many other public surveys (e.g., Current Population Survey, Survey of Consumer Finances, Survey of Income and Program Participation), the HRS provides imputations for some missing variables, especially economic variables like income and wealth. The HRS has provided these since its inception in 1992. Since the early 2000s, imputations (including for the earlier waves) have been done by RAND. In a survey where wealth is measured by asking questions about multiple components of wealth, it is essential to impute missing components to prevent bias and severe loss of information. The total wealth measure we study below is based on 14–17 components per wave. While missing rates in the components average 7.8%, 46.4% of observations have at least one component missing and thus total wealth missing. Moreover, for half of the missing components, the respondent provided bracket information, so we know its value lies in an interval. Without imputations we would lose a lot of information, especially in analyses that use total wealth, which is arguably the most common application. Furthermore, the observations with missings are selective: for all the observations with bracket information and close to half of the observations without bracket information, we know that the household owns the component, whereas for many components, ownership rates overall are low. Hence, the missings are more likely to have positive amounts than the nonmissings [9]. If observations with missings were simply dropped it would bias the estimated distributions of the components downward.
Initially, imputations in the HRS only used variables from the same wave as covariates (cross-sectional imputations), but it is likely that information from adjacent waves contains valuable information about the missing value of the asset. For example, the value of the primary residence in the previous wave should be a strong predictor of the value of the primary residence in the current wave, even after controlling for a broad set of covariates observed in the current wave. In principle, cross-wave imputations should reduce cross-wave excess variability in wealth resulting from measurement error.
Section 2 describes the data we use. Section 3 studies the effect of incorporating information from the asset verification section on the distributions of the asset variables and their wave-to-wave changes. The imputation methods used are described in Section 4, and the effects of including cross-wave information in the imputations are studied in Section 5. Section 6 discusses the findings and their implications for household survey operations.
2. Data
The Health and Retirement Study (HRS) [5,6] is a large scale multidisciplinary panel survey in the U.S. of individuals over age 50 and their spouses. Interviews have been conducted biennially since 1992. Refresher cohorts are added every six years. The HRS collects about 20,000 interviews every wave in about 12,000 to 15,000 households. See Appendix A for more details about the HRS sample composition. Imputations of the asset and income variables are produced by the RAND HRS Team and made available to the research community on the HRS website.
For the analyses of the effect of asset verification in section U, we rely on HRS public release raw data files of HRS waves 5 (2000) through 11 (2012) and create imputations that do not use section U. We compare these to the analogous data in the RAND HRS,2 version N [10] and the corresponding detailed income and wealth variables [11].3 The analyses of the effects of cross-wave versus cross-section imputations use the RAND HRS and detailed income and wealth data of waves 1 (1992) through 11 (2012).
Data construction, imputations, and numerical computations were done in SAS/ STAT, version 9.2 [12] on a Linux server.
3. Effects of asset verification on wealth distributions
3.1. Eligibility for section U
The unit of observation for the wealth variables is the “household”, that is, the respondent and his or her spouse or domestic partner, when applicable. The HRS asks who in the household is most knowledgeable about their finances and queries this person, the financial respondent, about the wealth variables (and many income variables). After the respondent has completed the asset and income module the survey instrument runs some internal checks comparing the just recorded asset values with those reported in the previous wave. The objective is to identify unusually large changes that could be due to misreporting.
Entering section U for a certain asset requires both global and local eligibility. Global eligibility is a financial respondent-level criterion, whereas local eligibility is determined at the level of the asset type.
Local eligibility is defined as a change in the asset of more than $50,000. That is, a difference between the currently reported value of the asset and the one reported in the previous wave of more than $50,000 is considered a large change that is eligible for questioning in section U.
Missing values complicate the comparisons. When respondents give a “don’t know” or “refuse” answer, they enter a sequence of unfolding bracket questions (see Appendix B), which often provide some information about the asset value. It is then possible to determine minimum and maximum values that are consistent with the respondent’s answers. In the comparisons, these are used by comparing the closest values. The difference criterion is interpreted as a gap of more than $50,000 between the previous wave and current wave intervals, that is, the minimum of one exceeds the maximum of the other by more than $50,000.
A respondent is globally eligible for entering section U if the respondent is the financial respondent of the household and was also the financial respondent in the previous wave, and the change in net worth between the previous wave and the current wave is more than $150,000.4 The reason for requiring the financial respondent to be the same as in the previous wave is confidentiality of the answers that a respondent gives: these should not be revealed to others (such as the spouse). Because section U mentions the previous wave’s answer, this can only be asked if it is the same person answering.5 The rationale for the net worth criterion is to prevent bothering households who rebalance their asset portfolios, for example, by selling their stocks and putting the money in a savings account.
Like the local eligibility criterion, the net worth comparison is also complicated by missing values. For each asset, we can determine minimum and maximum values that are consistent with the respondent’s answers, as is done for determining local eligibility, which can then be aggregated to provide a lower and upper bound of net worth. However, if the respondent gives a noninformative answer to one or more asset questions, or an open bracket (“more than $X”) for both an asset question and a liability (negative asset) question, then these lower and upper bounds may span all nonnegative numbers, or even the whole real line. Because a change of less than $150,000 in net worth would be consistent with this information, this would imply that many respondents whose asset values for one or more assets change significantly would be globally ineligible and not enter section U. Because the net worth criterion is an auxiliary criterion and the actual net worth underlying it is never mentioned to the respondent, HRS takes a pragmatic approach and replaces unbounded intervals by bounded ones. Specifically, if the maximum of an asset is infinite, it is replaced by 3 times the minimum. For the current wave, this is further simplified by computing only a rough estimate of net worth. For closed brackets (bounded minimum and maximum), the average of the minimum and maximum is used as the estimate of the asset value for this purpose, whereas for open brackets (unbounded maximum), 3 times the minimum is used as the point estimate.
Respondents who enter section U typically answer questions about a small subset of their assets: only the ones that changed much since the previous wave. Hence, eligibility is determined at the household-asset level. Table 1 lists the assets that are potentially questioned in section U. Assets are asked in the order listed. The order matters because in waves 6 and 7 (2002 and 2004; see Appendix A for a description of the HRS cohorts and waves), only the first three eligible assets were asked, and in all years, a refusal for one asset means that later assets are not asked about.
Table 1.
Assets included in section U
| Number | Description |
|---|---|
| 1 | Other debt |
| 2 | Trusts |
| 3 | Other assets |
| 4 | Vehicles |
| 5 | Certificates of deposit |
| 6 | Checking and savings accounts |
| 7 | Bonds and bond funds |
| 8 | Stocks and mutual funds |
| 9 | Individual retirement accounts |
| 10 | Businesses/farms |
| 11 | Other real estate |
| 12 | Primary residence |
| 13 | Mortgage 1 on primary residence |
| 14 | Mortgage 2 on primary residence |
| 15 | Home equity line of credit balance |
| 16 | Mobile home |
| 17 | Second home |
| 18 | Mortgages/loans on second home |
Several assets that are asked in the HRS are not asked in section U: other loans on the first home are not included, the cash value (and face value) of life insurance, and defined contribution pension plan (e.g., 401(k)) balances. The income and wealth section (section Q) asks about all trusts and trusts that are not reported elsewhere, whereas at most one of these is asked in section U.
3.2. Frequencies of entering and correcting
Given the number of assets involved that require cross-checking and the complications resulting from missing values, the programming of the survey instrument turned out to be quite complicated and a few deviations from the intended design occurred over time (see Appendix C for details). These resulted in some variation across waves in the number of asset verifications that were triggered and the fraction of confirmed values. Table 2 shows the number of households that entered section U for one or more assets by wave.
Table 2.
Number of households entering section U
| Wave year | Households interviewed | Households entered section U | Percent of households that entered section U |
|---|---|---|---|
| 2002 | 12,349 | 711 | 5.8 |
| 2004 | 13,645 | 1,986 | 14.6 |
| 2006 | 12,605 | 2,234 | 17.7 |
| 2008 | 11,897 | 2,362 | 19.9 |
| 2010 | 15,280 | 2,021 | 13.2 |
| 2012 | 14,316 | 2,675 | 18.7 |
A household may enter section U for more than one asset, allowing households to contribute multiple records if they were challenged on multiple assets. Table 3 shows the number of assets (household-asset combinations) that entered section U in each wave, and whether the respondent confirmed or corrected the value. In most waves, between 60 and 70 percent of the challenged asset values were confirmed, but this number was substantially lower in 2012. The fraction of corrections varies across waves. At least some of that variation is explained by deviations in the survey instrument programming (see Appendix C).
Table 3.
Number of assets entering section U, and percentage confirmed and corrected, by wave
| Wave year | Entered sec. U | Confirmed (% of enter) | Incorrect (% of enter) | Which wave incorrect (% of incorrect) |
||
|---|---|---|---|---|---|---|
| Previous | Current | Both | ||||
| 2002 | 1,478 | 61.4 | 32.9 | 38.2 | 54.0 | 0.6 |
| 2004 | 4,097 | 62.9 | 34.0 | 41.8 | 44.2 | 10.8 |
| 2006 | 5,414 | 69.5 | 28.7 | 52.9 | 36.4 | 7.7 |
| 2008 | 5,640 | 65.4 | 32.3 | 56.5 | 35.4 | 6.4 |
| 2010 | 5,071 | 64.5 | 32.2 | 45.4 | 44.2 | 5.4 |
| 2012 | 6,504 | 48.4 | 48.2 | 69.8 | 22.7 | 4.2 |
Note: “Entered section U” is number of household-asset combinations for which section U was entered; categories “DK whether correct” and “DK which wave incorrect” not shown.
3.3. Effect of section U corrections on marginal distributions
Table 4 shows the mean and standard deviation of total wealth before and after section U corrections and the resulting (cross-sectional) imputations. Total wealth as defined here is the RAND HRS variable HwATOTA (where “w” denotes the wave number), which is the sum of the 18 assets from Table 1 (with liabilities counted as negative assets), except for trusts, the second home, and any loans on the second home, but including “other home loans” (on the first home), which are not part of section U.6 We see that section U has a modest impact on the means, but a large impact on the standard deviations in some years. For 2004 and 2006, the standard deviation is much lower, because a few large outliers were corrected in section U. In 2002, the reverse happened: two large outliers were introduced in section U. The reductions in the standard deviation of total wealth observed for most waves as a result of section U would increase the precision of coefficient estimates in regressions where total wealth is used as a covariate.
Table 4.
Effect of section U corrections on total wealth
| Wave year | Mean ($) |
s.d. ($) |
N | ||||
|---|---|---|---|---|---|---|---|
| Without U | With U | Diff (%) | Without U | With U | Diff (%) | ||
| 2000 | 315,490 | 308,140 | −2.3 | 972,906 | 921,939 | −5.2 | 13,214 |
| 2002 | 314,297 | 325,437 | 3.5 | 838,382 | 1,151,907 | 37.4 | 12,349 |
| 2004 | 372,762 | 361,986 | −2.9 | 1,425,276 | 1,052,745 | −26.1 | 13,645 |
| 2006 | 469,880 | 426,678 | −9.2 | 2,151,258 | 1,139,984 | −47.0 | 12,605 |
| 2008 | 440,711 | 440,965 | 0.1 | 1,255,424 | 1,175,743 | −6.3 | 11,897 |
| 2010 | 327,664 | 342,427 | 4.5 | 900,611 | 954,488 | 6.0 | 15,276 |
| 2012 | 345,585 | 338,405 | −2.1 | 1,173,743 | 979,390 | −16.6 | 14,314 |
Note: “Total wealth” is the sum of all assets (liabilities viewed as negative assets) except trusts and the net value of the second home. The large increase of the s.d. in 2002 is due to two households reporting very large values for “other assets” in section U: one of $30 million and one of $90 million.
Next, we examine the distribution of wave-to-wave changes (first differences) before and after the section U corrections. Reducing the variance in the wave-to-wave changes was a main objective of the introduction of section U. Table 5 presents the means and standard deviations of the first differences in total wealth. As expected, the mean change is of a much smaller magnitude than the mean wealth level. The impact of the section U corrections on the mean change is relatively large compared to this smaller base, but modest compared to mean total wealth. In contrast, the standard deviation of the change in total wealth is of a similar order of magnitude as the standard deviation of the wealth level. In most years, the section U corrections result in a substantial reduction of this standard deviation of the change in wealth, which again, will increase the statistical precision of parameter estimates on wealth in regressions. Just in 2000–2002 the standard deviation increased largely due to the two outliers introduced by section U in 2002, as mentioned above.
Table 5.
Effect of section U corrections on first differences in total wealth
| Wave years | Mean ($) |
s.d. ($) |
N | ||||
|---|---|---|---|---|---|---|---|
| Without U | With U | Diff (%) | Without U | With U | Diff (%) | ||
| 1998–2000 | 44,652 | 36,944 | −17.3 | 1,055,494 | 1,011,108 | −4.2 | 12,843 |
| 2000–2002 | −8,311 | 11,127 | −233.9 | 887,338 | 1,154,082 | 30.1 | 11,815 |
| 2002–2004 | 62,587 | 36,514 | −41.7 | 1,357,598 | 1,233,760 | −9.1 | 11,144 |
| 2004–2006 | 91,832 | 58,328 | −36.5 | 1,970,696 | 851,753 | −56.8 | 12,263 |
| 2006–2008 | −38,813 | 1,331 | −103.4 | 1,786,520 | 957,765 | −46.4 | 11,442 |
| 2008–2010 | −78,903 | −59,592 | −24.5 | 1,087,506 | 837,102 | −23.0 | 10,550 |
| 2010–2012 | 9,657 | −12,362 | −228.0 | 1,001,104 | 759,602 | −24.1 | 13,941 |
Note: “Total wealth” is the sum of all assets (liabilities viewed as negative assets) except trusts and the net value of the second home. The large increase of the s.d. in 2002 is due to two households reporting very large values for “other assets” in section U: one of $30 million and one of $90 million.
3.4. Ownership changes, spikes, and trenches
Changes in ownership form an important component of wave-to-wave changes in wealth. If ownership is recorded incorrectly, this may have a large impact on recorded wealth and especially recorded changes in wealth. The top panel of Table 6 presents the number of ownership changes before and after section U corrections, across all assets and all wave combinations that are affected by section U (1998–2000 up to 2010–2012). Section U reduces the number of ownership changes, but compared to the total number of household-asset-wave combinations, the net reduction is modest. The second panel of Table 6 shows analogous results for ownership reversals, that is, households that saw an ownership change and then a reversal of that change in the subsequent wave. Such a reversal could be an indication of an error in the middle wave, especially for assets that are not often acquired or disposed of, such as IRAs [8]. These results are in line with the results about wave-to-wave ownership changes, but stronger. Detailed results by wave (not presented in the table) show that especially in the wave triplets that are more affected by section U (2002–2006 up to 2008–2012), the effects on ownership reversals are noticeable (around 5 percent).
Table 6.
Effect of section U corrections on ownership changes, spikes, and trenches (across all assets, 1998–2012)
| Before sec.U | After sec.U | Diff (%) | N | |
|---|---|---|---|---|
| Wave-to-wave ownership changes, 1998–2012 | ||||
| Own to not own | 85,816 | 84,340 | −1.7 | 1,763,958 |
| Not own to own | 71,069 | 69,745 | −1.9 | 1,763,958 |
| Ownership reversals in three consecutive waves, 1995–2012 | ||||
| Own to not own to own | 24,539 | 23,714 | −3.4 | 1,510,481 |
| Not own to own to not own | 32,198 | 31,161 | −3.2 | 1,510,481 |
| Spikes and trenches in three consecutive waves, 1995–2012 | ||||
| Trenches – individual assets | 9,583 | 9,338 | −2.6 | 1,510,481 |
| Spikes – individual assets | 3,444 | 3,270 | −5.1 | 1,510,481 |
| Trenches – total wealth | 1,478 | 1,404 | −5.0 | 72,841 |
| Spikes – total wealth | 400 | 391 | −2.3 | 72,841 |
Note: “Total wealth” is the sum of all assets (liabilities viewed as negative assets) except trusts and the net value of the second home. See text for definitions of spikes and trenches.
Section U was intended to catch conspicuously large changes in asset values. As with ownership reversals, two consecutive large amount changes in opposite directions are indicative of errors in the middle of the three waves involved. To study the frequencies of these phenomena, we define a spike as a set of three consecutive positive values for an asset in which the middle value is larger than 10 times the average of the two outer values. Analogously, a trench is a set of three consecutive positive values in which the middle value is smaller than one tenth of the average of the two outer values. The bottom panel of Table 6 shows the frequencies of spikes and trenches across all affected wave triplets for all individual assets combined, as well as for total wealth. Asset verification reduces the number of spikes and trenches. Again (not shown in the table), section U’s dampening effect is strongest for the wave triplets that are most affected by section U (2002–2006 up to 2008–2012). However, for total wealth the results vary more across waves and sometimes the number of spikes or trenches actually increases. But the numbers are small for this (typically 50–60 spikes per wave triplet), so this is subject to more random variation.
Table 6 shows that trenches are much more common than spikes. Accidental underreporting thus appears more common than accidental overreporting. Conversely, however, reversals from not own to own to not own are more common than ownership reversals the other way around.
4. Imputation methods
When a respondent fails to report a continuous value for an asset, the RAND HRS Team imputes the value. These imputations are generally regarded highly, for two main reasons. First, in the data collection phase, HRS has implemented innovations that increase the quality of the data at the source. In particular, the use of the abovementioned unfolding brackets (discussed in more detail in Appendix B) greatly enhances the amount of available information and reduces the amount of imputation uncertainty. Second, the imputation method uses the information in the data systematically and thoroughly, imputing ownership, bracket, and value as necessary; it uses a broad set of covariates in the imputation model; and for most value imputations uses a nearest neighbor approach that does not require distributional assumptions, which makes the method robust to model (mis) specification issues. Here we study the effects of a further refinement of the imputation method, aimed specifically at reducing variance of asset changes induced by the imputation uncertainty. This refinement consists primarily of adding information about the previous wave’s and next wave’s value of the asset to the set of covariates used in the imputation models. We will refer to these as cross-wave imputations, as opposed to cross-sectional imputations. Because the values of most assets tend to be highly correlated across time, cross-wave imputation is expected to lead to both more accurate imputations with less imputation error and smaller variances of asset changes.
4.1. Model specification
All models and imputations are asset-specific, so we suppress the asset indicator in the following and abstractly talk about “the asset”. Let yit be the outcome variable of interest for household i in wave t. This is the inverse hyperbolic sine transform of the value of the asset. Analogously, let the binary variable dit denote whether the household owns the asset. The covariates xit in the cross-sectional imputations are the first 10 principal components of about 30 explanatory variables, which includes demographics, education, health, cognition, income, bequest expectations, and some interactions (see [10], Section 3.2.4, for a list), plus the constant. In the cross-wave imputations, we add four covariates to these: (1) a dummy for the previous wave’s ownership of the asset; (2) the inverse hyperbolic sine transform of the previous wave’s value of the asset (zero in case of no ownership); (3) a dummy for the following wave’s ownership of the asset; and (4) the inverse hyperbolic sine transform of the following wave’s value of the asset (zero in case of no ownership).
Changes in couple status have large systematic consequences for household wealth [13]. Without taking this into account, the prediction of yit from its prior value would be biased upward (in the case of divorce or widowing) or downward (in the case of a new marriage). Hence, for the cross-wave imputations we add dummies for marital status changes to the list of covariates. However, because some of these changes are not very common, we do not have enough observations to estimate their coefficients for each wave separately. Therefore, we pool the data for all waves and estimate these coefficients for all waves jointly, that is, we assume that these coefficients are the same for all waves. The other parameters are all wave-specific.
Thus, the covariates form a linear index of the form
| (1) |
where mit is a vector of dummies for marital status changes between wave t−1 and wave t. We classify marital status in the following categories: (1) married or unmarried couple; (2) divorced/separated; (3) widowed; and (4) never married. We use “no change” as the reference category, and include in mit separate dummies for changes from 1 to 2 (“divorce”), 1 to 3 (“widow”), and 2, 3, or 4 to 1 (“(re)marry”), and consider all other potential combinations “no change”, so these are three dummies.
If a household was present in the adjacent waves but did not report a continuous value of the asset, their values are missing, but we do have the cross-sectional imputations of them, so in estimating the cross-wave imputation models and performing the cross-wave imputations we use these cross-sectional imputations as covariates instead. If a household was not present in one of the adjacent waves, we do not have such imputations. Therefore, we estimate two additional imputation models: one with only the previous wave’s covariates added and one with only the following wave’s covariates added. We impute using the model that uses the most information available for the household. For households for which neither previous wave information nor following wave information is available, we use the cross-sectional imputations. Thus, the cross-sectional imputation models estimate (1) with only the first term on the right-hand side included, the full cross-wave imputation models estimate it with all terms included, and we also estimate variants of (1) that include the previous wave covariates and not the next wave covariates or vice versa. This gives us four versions of each of the imputation models.
Ownership, bracket, and value are imputed sequentially, if necessary. For ownership, the model is a binary logit (logistic regression): pit = Pr(dit = 1) = exp(vit)/(1 + exp(vit)). The brackets form a finite ordered set, and the model is an ordered logit model. Labeling the brackets as 1,…, J,
where the αjt, j = 1,…, J − 1 are threshold parameters that are increasing with j, with α0t = −∞, and αJt = +∞. The other thresholds are additional parameters that are estimated. (The coefficients and thus the linear indexes are different for the different models, but we suppress that in the notation.) For the amounts, a linear regression model is estimated, that is, yit = vit + εit, with εit an error term. This model is used for semiparametric imputation of amounts in closed brackets, that is, with a finite upper bound. For the top bracket, this method is prone to imputing outliers, and therefore a different method is used, called tobit25. The tobit25 model fits a conditionally lognormal distribution to the amounts in levels, which can also be written as yit = vit + εit, but now yit is the natural logarithm of the amount (which is very similar to the inverse hyperbolic sine, but easier to work with for this model) and a normal distribution is assumed for the error term εit. Furthermore, values in the lowest quartile are treated as censored values, which has the effect of better fitting the distribution in the right tail, which leads to better imputations in the top bracket than if an uncensored linear regression model were used. Finally, we also estimate another model, called tobit0, which is equivalent to the tobit25 model, except that it is estimated on the uncensored observations, except that (rare) negative and zero amounts are not used. This model is used in certain cases where imputation methods based on the other models cannot be used in a satisfactory way.
The cross-sectional imputation models are estimated separately for each wave. We estimate the cross-wave imputation models for all waves jointly, on the stacked data with all observations included for which the required information is available. Except for the categorical variables capturing marital status change, we interact all explanatory variables with a set of wave dummies, allowing the coefficients to be wave-specific. The auxiliary parameters in the models – the threshold parameters in the ordered logit models for the brackets and the variance parameter in the tobit models – are also wave specific. Furthermore, the tobit25 model uses a wave-specific (but known) censoring threshold.
4.2. Estimation sample
In a small number of cases per wave, no financial respondent is interviewed, for example, if the designated financial respondent dies before the scheduled interview but after the interview with the surviving spouse. In this case, many more variables are missing and in particular many income variables and all wealth variables are missing. We estimate a separate set of models for imputing these cases, with a reduced set of covariates (but estimated on the sample with a financial respondent).
For the ownership models, the estimation sample consists of all households for which ownership is reported. For the bracket models, the estimation sample consists of all households that completed the unfolding bracket sequence. Their information results in a complete bracket. The set of complete brackets is the set of narrowest mutually exclusive intervals that are exhaustive (cover all nonnegative amounts). For the amount models, the estimation sample consists of all households that reported a continuous amount.
If the household did not respond at all in the previous wave or next wave, the corresponding ownership and asset values have not been imputed, and we exclude these households from estimation for the models that require them.
In some cases, the number of reported observations on a variable is insufficient to estimate the imputation model accurately. Then we simply use the marginal distribution, which is the same as estimating the model without covariates.
4.3. Imputation
Cross-wave and cross-sectional imputations are done in the same way, but are based on different sets of covariates, as discussed in Section 4.1. Imputations are computed with the following steps:
If ownership is missing, compute the predicted probability pit of ownership based on the imputation model. Take a draw uit from the uniform distribution on the interval [0,1]. Assign dit = 1 (owning) if uit ⩽ pit and dit = 0 (not owning) otherwise.
If dit = 1 (household owns the asset, either reported or imputed), but no brackets are given, compute the predicted probabilities pitj of being in the j-th bracket based on the imputation model. If the respondent gave some bracket information, but not complete, the probabilities pitj that are consistent with the respondent’s answers are scaled such that they sum to 1, and the probabilities of the other brackets are set to zero. Then compute the resulting cumulative probabilities . Take a draw zit from the uniform distribution on the interval [0,1] and assign bracket j if Pit,j−1 < zit ⩽ Pitj (where Pit0 = 0).
If a closed complete bracket is given or imputed, but no continuous value is given, compute the value of the linear index vit of the linear regression model for all households. From the households in the same wave that report a value within the bracket of interest, select the household whose linear index is closest to the one of the household whose value must be imputed. Impute the continuous value reported by this (nearest neighbor, donor) household into the household for which we only have the bracket. Because the model is only used to select the nearest neighbor and not to generate a continuous value, this method is semiparametric. These imputations are less sensitive to model assumptions and model misspecification. This nearest neighbor method is also called predictive mean matching in the literature (e.g., [14]).
If the open top bracket is reported or imputed, but no continuous value is given, compute the value of the linear index vit of the tobit25 model. Let ct be the threshold value of the top bracket (i.e., its lower endpoint) and σt the estimated standard deviation of the tobit model. Draw yit from the truncated normal distribution with mean vit, standard deviation σt, left truncation point ct, and no right truncation. That is, compute , where eit is again a draw from the uniform distribution on the interval [0,1].7 The imputed amount is exp(yit).
There are some alternatives for special cases in which the above methods cannot be used, and a conditional hot-deck is used, or the amount is imputed from the tobit0 model. See Hurd et al. [15] (Appendix E) for a list of these situations and which imputation method is used in which case. The conditional hot deck is a special case of the nearest neighbor method without covariates. Imputation from the tobit0 model is similar to imputation from the tobit25 model, except that the left truncation point may differ between households, and there is also a right truncation point (which may also vary across households). Thus, the expression for the natural logarithm of the amount is replaced by , where Lit and Uit are the left and right truncation points, respectively, on the log scale. Either or both of these truncation points may be infinite, depending on the available bracket information.
5. Effects of cross-wave imputations on wealth distributions
In this section, we compare distributions in the HRS data using cross-sectional imputations (after section U corrections as described in Section 3) with the corresponding distributions using cross-wave imputations (also incorporating the section U corrections).
5.1. Effect of cross-wave imputations on marginal distributions
Table 7 shows the effect of cross-wave imputation on the marginal distribution of total wealth. There are some modest differences in means and standard deviations, but there is no systematic pattern in sign or magnitude of these. This confirms our expectations: the cross-wave imputations are expected to increase the serial correlation, but not necessarily change the marginal distributions.
Table 7.
Effect of cross-wave imputations on total wealth
| Mean ($) |
s.d. ($) |
N | |||||
|---|---|---|---|---|---|---|---|
| Wave year | Before cross-wave | After cross-wave | Diff (%) | Before cross-wave | After cross-wave | Diff (%) | |
| 1992 | 193,778 | 185,424 | −4.3 | 430,782 | 438,150 | 1.7 | 7,702 |
| 1993 | 156,624 | 154,978 | −1.1 | 363,241 | 364,318 | 0.3 | 6,047 |
| 1994 | 220,009 | 222,602 | 1.2 | 482,419 | 517,794 | 7.3 | 7,051 |
| 1995 | 234,111 | 230,704 | −1.5 | 855,267 | 835,696 | −2.3 | 5,222 |
| 1996 | 247,321 | 246,444 | −0.4 | 565,812 | 579,417 | 2.4 | 6,811 |
| 1998 | 265,818 | 268,749 | 1.1 | 984,536 | 997,645 | 1.3 | 14,395 |
| 2000 | 308,140 | 303,461 | −1.5 | 921,939 | 896,656 | −2.7 | 13,214 |
| 2002 | 325,437 | 322,249 | −1.0 | 1,151,907 | 1,153,152 | 0.1 | 12,349 |
| 2004 | 361,986 | 364,538 | 0.7 | 1,052,745 | 1,080,659 | 2.7 | 13,645 |
| 2006 | 426,678 | 431,312 | 1.1 | 1,139,984 | 1,212,187 | 6.3 | 12,605 |
| 2008 | 440,965 | 434,116 | −1.6 | 1,175,743 | 1,156,295 | −1.7 | 11,897 |
| 2010 | 342,427 | 338,436 | −1.2 | 954,488 | 962,490 | 0.8 | 15,276 |
| 2012 | 338,405 | 341,797 | 1.0 | 979,390 | 1,007,010 | 2.8 | 14,314 |
Note: “Total wealth” is the sum of all assets (liabilities viewed as negative assets) except trusts and the net value of the second home.
Table 8 shows the effect of cross-wave imputation on the first differences of total wealth. We would not expect an effect on the mean first difference, but we would expect a reduction in the standard deviation. By and large, the table confirms this: percentage-wise, the differences in the means can be large, but this is mainly because the means are much closer to zero than the levels, and sometimes very close to zero (small denominator). With only one exception, the standard deviations of the first differences are smaller with cross-wave imputation, but the magnitudes of the effect of cross-wave imputation compared to cross-sectional imputation are modest. The cross-wave imputations decrease the variability of the changes, but not by much.
Table 8.
Effect of cross-wave imputations on first differences in total wealth
| Mean ($) |
s.d. ($) |
N | |||||
|---|---|---|---|---|---|---|---|
| Wave years | Before cross-wave | After cross-wave | Diff (%) | Before cross-wave | After cross-wave | Diff (%) | |
| 1992–1994 | 24,450 | 35,731 | 46.1 | 370,663 | 358,724 | −3.2 | 7,029 |
| 1993–1995 | 67,703 | 67,221 | −0.7 | 786,058 | 760,311 | −3.3 | 5,209 |
| 1994–1996 | 25,363 | 22,414 | −11.6 | 406,645 | 385,269 | −5.3 | 6,534 |
| 1995–1998 | −9,588 | −7,680 | −19.9 | 576,200 | 527,158 | −8.5 | 4,419 |
| 1996–1998 | 45,779 | 51,687 | 12.9 | 1,004,359 | 997,685 | −0.7 | 6,351 |
| 1998–2000 | 36,944 | 28,268 | −23.5 | 1,011,108 | 978,886 | −3.2 | 12,843 |
| 2000–2002 | 11,127 | 11,588 | 4.1 | 1,154,082 | 1,134,003 | −1.7 | 11,815 |
| 2002–2004 | 36,514 | 44,041 | 20.6 | 1,233,760 | 1,228,908 | −0.4 | 11,144 |
| 2004–2006 | 58,328 | 59,357 | 1.8 | 851,753 | 839,579 | −1.4 | 12,263 |
| 2006–2008 | 1,331 | −11,258 | −946.1 | 957,765 | 968,656 | 1.1 | 11,442 |
| 2008–2010 | −59,592 | −56,556 | −5.1 | 837,102 | 796,151 | −4.9 | 10,550 |
| 2010–2012 | −12,362 | −3,802 | −69.2 | 759,602 | 717,772 | −5.5 | 13,941 |
Note: “Total wealth” is the sum of all assets (liabilities viewed as negative assets) except trusts and the net value of the second home.
5.2. Ownership changes, spikes, and trenches
The top panel of Table 9 shows the number of ownership changes across all assets. As with the first differences, we would expect that imputations that take adjacent-wave information into account result in fewer changes than imputations that do not take this into account. The table confirms this: The cross-wave imputations result in fewer ownership changes across the board, but again, the effect is relatively small. The middle panel of Table 9 presents the number of ownership reversals across three consecutive waves. This also shows across the board reductions. The percent effect of the cross-wave imputations on ownership reversals is larger than on wave-to-wave ownership changes. Thus, the ownership changes that are affected are more often ones that are reversed in the next wave, that is, the ones that are more likely to be incorrect.
Table 9.
Effect of cross-wave imputations on ownership changes, spikes, and trenches (across all assets, 1992–2012)
| Before cross-wave | After cross-wave | Diff (%) | N | |
|---|---|---|---|---|
| Wave-to-wave ownership changes, 1992–2012 | ||||
| Own to not own | 113,516 | 110,926 | −2.3 | 2,282,998 |
| Not own to own | 95,220 | 92,498 | −2.9 | 2,282,998 |
| Ownership reversals in three consecutive waves, 1992–2012 | ||||
| Own to not own to own | 28,760 | 27,548 | −4.2 | 1,783,121 |
| Not own to own to not own | 38,778 | 37,028 | −4.5 | 1,783,121 |
| Spikes and trenches in three consecutive waves, 1992–2012 | ||||
| Trenches – individual assets | 11,034 | 9,212 | −16.5 | 1,766,081 |
| Spikes – individual assets | 3,937 | 3,102 | −21.2 | 1,766,081 |
| Trenches – total wealth | 1,655 | 1,436 | −13.2 | 89,881 |
| Spikes – total wealth | 468 | 350 | −25.2 | 89,881 |
Note: “Total wealth” is the sum of all assets (liabilities viewed as negative assets) except trusts and the net value of the second home. See the text of Section 3 for definitions of spikes and trenches.
The bottom panel of Table 9 shows the effects of cross-wave imputation on the number of spikes and trenches as defined in Section 3, that is, instances in which the amount is positive in three consecutive waves, but the middle one differs by more than a factor of 10 from the average of the two outer waves. Spikes and trenches are relatively rare, but they may have a large impact on statistics such as means and standard deviations, especially of the distribution of change scores. Furthermore, if they are due to imputation that takes only cross-sectional information into account, it is likely that these misrepresent the distribution of interest. Table 9 shows that the cross-wave imputations lead to a large reduction in the number of spikes and trenches.
6. Discussion
The accurate measurement of household wealth is important for many microeconomic studies. Moreover, often researchers study wealth changes. Because wealth is often subject to measurement error, and the effect of measurement error on bias in coefficient estimators is exacerbated by taking first differences, measurement error often leads to serious problems with these analyses. In this paper, we study two improvements in wealth measurement that are intended to alleviate this problem. The first improvement relies on additional survey questions, an asset verification module to challenge particularly large changes in wealth and the second improvement incorporates cross-wave information in the imputation procedures for missing data. We studied these improvements in the Health and Retirement Study.
Both the asset verification corrections and the cross-wave imputations had the expected (and intended) effect: with a few exceptions, they reduced the variability of wealth changes. The effect on the variability of first differences is stronger for the asset verification corrections than for the cross-wave imputations, but they complement each other. Conversely, cross-wave imputations reduce the number of “spikes” and “trenches” (in which a wave’s asset value differs more than a factor of 10 from the average of the two adjacent waves) more than asset verification does. The improved wealth measures have been made available to the research community as part of the RAND HRS Longitudinal File [10] starting with version M (HRS 1992–2010) and onward.
The methods studied in this paper are particularly suitable for variables that change slowly over time. For such variables, there is much information in values from adjacent waves, either to assess whether there is a high chance of a misreport (in the current or previous wave), or to use directly in improving imputations in the presence of missing values. In addition to being relevant for measuring wealth in other surveys, our results may be informative about the scope for improvement of measurement of other variables that tend to change slowly. For variables that have more wave-to-wave variation, like income, the effects of introducing these methods are expected to be smaller, though they may still be useful if there is nonnegligible serial correlation in those variables.
The HRS has been at the forefront of methodological innovations in its questionnaire, of which the use of unfolding brackets is an important case. These greatly reduce the imputation uncertainty when there are missing data, which may explain why the effects of the cross-wave imputations were not large, although not negligible either. If such brackets are not available, the potential for improvement is larger, and preserving the correlation between measurements in consecutive waves through cross-wave imputation is likely to have larger effects than in the current study. The potential for cross-wave imputations to have a sizable effect is also larger if the fraction missing is higher than in the HRS. The main drawback of the asset verification method is the increased respondent burden, and thus the question is whether the improved data as a result of catching errors is worth the additional respondent burden. The answer to this question depends on how onerous the question is and how “bad” the discrepancy between two adjacent measures is. Another consideration is the complication of the instrument programming to determine eligibility for section U. With many asset items involved, and additional complications due to missing values and incomplete bracket responses, the number of different eligibility scenarios is very large. Programming this correctly is time consuming and so is the testing involved. In the case of the HRS, this took several attempts to implement correctly. Of course, this was new territory at the time with many practical issues that needed to be resolved along the way.
Acknowledgments
We thank Mary Beth Ofstedal, David Weir, Mick Couper, and participants at various HRS co-PI meetings and HRS data monitoring committee meetings as well as the Fourth Panel Survey Methods Workshop (Ann Arbor, MI, 2014) for numerous comments and suggestions. Furthermore, we thank Chris Chan, Jack Chen, Regan Main, Philip Pantoja, Patty St.Clair, and the RAND HRS Team for help with data and coding. This research was supported by grants from the National Institute on Aging (U01AG009740 and P30AG012815), with additional financial support from the Social Security Administration.
Appendix A: Cohorts and waves in the HRS
The HRS started in 1992 with wave 1. It interviewed individuals born 1931–1941 (inclusive) and their spouses of any age. These respondents have since been interviewed biennially. Wave 12 was conducted in 2014. In 1993, the AHEAD (Assets and Health of the Oldest Old) study was started with individuals born in 1923 or earlier and their spouses of any age. These respondents were re-interviewed in 1995. The AHEAD and HRS questionnaires were very similar, so these studies were highly comparable. About 100 respondents from couples that satisfied both the HRS and AHEAD sample selection criteria were initially interviewed in wave 1 of the HRS but then continued as part of the AHEAD. In 1998, HRS and AHEAD were merged into one study, also called the HRS. Households that were part of the original HRS sample became the HRS cohort of the combined HRS and households that were part of the AHEAD study became the AHEAD cohort of the combined HRS. The HRS has been conducted biennially since 1998.
In the RAND HRS data, the first AHEAD wave (1993) is combined with wave 2 of the HRS (1994) and the second AHEAD wave (1995) is combined with wave 3 of the HRS. However, the RAND imputation programs treat these as separate waves (2A, 2H, 3A, and 3H, respectively) and in this paper, we also keep them separate.
In 1998, two new cohorts were introduced. The CODA (Children of the Depression Age) cohort includes individuals born 1924–1930 (but who did not have a spouse who would make them eligible for the HRS or AHEAD cohort) and their spouses, and the WB (War Babies) cohort spans birth years 1942–1947 (with analogous restrictions). In 2004, the EBB (Early Baby Boomer) cohort was added (born 1948–1953, except those with spouses of already covered cohorts) and in 2010, the MBB (Middle Baby Boomer) cohort was added (1954–1959).
As indicated, the HRS always interviews both the age-eligible respondent and the spouse, regardless of the age of the spouse. Cohabiting partners are treated as spouses. If the spouse would be age-eligible for a cohort already covered, the household is not eligible for the new sample. After a separation or divorce, HRS follows both former spouses regardless of age eligibility, and also interviews their new spouses, if any.
Respondents are drawn from the noninstitutionalized population, but they are followed when they enter a nursing home.
Appendix B: Example question texts
This appendix gives an example of the sequence of questions for a wealth component, specifically the sequence for stocks and mutual funds in the 2010 wave. This is a typical asset. For other assets and liabilities, such as the primary residence, there may be more questions (house versus mobile home, whether on a farm, etc.), skip patterns (questions about mortgages not asked if the household do not own their primary residence), or other variations.
The questions about stocks in the assets and income section (section Q) span questions Q316 to Q320. As presented in the pdf version of the questionnaire ( [16]; note that the HRS is administered over the phone or in person; in both cases CAPI is used and the interviewer reads the questions to the respondent), the questions are as follows:
- Q316
- IF R IS COUPLED (X065 = {1 or 3}):
- The next questions are about your assets, including those held by you and your [husband/wife/partner] jointly, by you only, or by your [husband/wife/partner] only.
- ASK OF ALL Rs:
- Aside from anything you have already told me about, do you (or your [husband/wife/partner]) have any shares of stock or stock mutual funds?
- 1. YES
- 5. NO [GO TO Q330]
- 8. DK [GO TO Q330]
- 9. RF [GO TO Q330]
- Q317
- IF R IS COUPLED (X065 = {1 or 3}):
- If you (or your [husband/wife/partner]) sold all those and paid off anything you owed on them, about how much would you have?
- OTHERWISE:
- If you sold all those and paid off anything you owed on them, about how much would you have?
- [IWER: DO NOT PROBE DK/RF]
- ______[AMOUNT] [GO TO Q321]
- DK
- RF
- Q318–Q320 Unfolding Sequence
- Question text: Would it amount to less than $____, more than $____, or what?
- PROCEDURES: 3Up, 2Up1Down, 1Up2Down
- BREAKPOINTS: $60,000, $450,000, $800,000, $1,750,000
- RANDOM ENTRY POINT ASSIGNMENT [1 ($60,000)] or [2 ($450,000)] or
- [{NOT 1 and NOT 2} ($800,000)] AT X044
For example, suppose that the respondent is single (X065 = 6) and that the random entry point assignment X044 = 2. Furthermore, suppose that the respondent reports owning stocks, but not knowing the exact amount, but knowing it is less than $60,000. Then the sequence of questions and answers would be as follows:
- [Q316] Aside from anything you have already told me about, do you have any shares of stock or stock mutual funds?
- 1. YES
- [Q317] If you sold all those and paid off anything you owed on them, about how much would you have?
- DK
- [Q318] Would it amount to less than $450,000, more than $450,000, or what?
- Less than $450,000
- [Q319] Would it amount to less than $60,000, more than $60,000, or what?
- Less than $60,000
At this point, the sequence stops (Q320 is not asked). The bracket minimum is $0 and the bracket maximum is $60,000. However, if X044 were 1, Q318 would use $60,000 instead of $450,000. If the answer would be “less” to this question, the sequence would stop after this question, but if “more”, Q319 would ask about $450,000. After another “more”, the respondent would be asked Q320, which in this case would have the same question text, but with $1,750,000 as the amount asked about. In each step, “about” the given number is also a valid answer (e.g., “about $60,000”). The respondent can terminate the sequence with a DK (don’t know) or RF (refuse) answer in each question. Hence, the number of possibilities of final intervals is quite large.
B.1. Asset verification
Suppose the difference between the reported amount in section Q and the reported amount in the previous wave was more than $50,000, the respondent was also the financial respondent in the previous wave (2008), and the change in total wealth since the previous wave was more than $150,000. Then the respondent entered section U and was asked the asset verification questions. The questionnaire of this section [17] is rather complicated, so we will only walk through one example here. Suppose that the respondent reported owning stocks worth $130,000 in 2008 and reported owning stocks worth $50,000 in 2010. Moreover, suppose that this respondent claimed that the 2008 report was wrong and that the correct amount should have been $90,000. The asset verification questions in 2010 for stocks would then be the following:
- [U001] According to my records, in 2008 you had stocks worth about $130,000. Now they are worth about $50,000. Does this sound right?
- 5. NO
- [U002] Which record is wrong, the 2008 or the 2010 report?
- 1. PREVIOUS WAVE
- [U003] About how much were these worth in 2008?
- $90,000 [AMOUNT]
Depending on the answers, the respondent could be asked to correct the 2008 amount, the 2010 amount, or both, and for each, the respondent could enter an unfolding bracket sequence. However, note that respondents cannot change ownership explicitly in section U: if the respondent does not own stocks, $0 must be entered as the amount.
Appendix C: Implementation variations of the asset verification section
Since its introduction in 2002, a few design changes have been made to the asset verification section (section U). Furthermore, because of its complexity, the actual implementation sometimes differed from the intended implementation. Hurd et al. [15] describe these variations in detail. The consequences for our study are fairly small, except that these variations explain some of the patterns in the tables, and that imperfect implementations probably reduce the effect of this section compared to an ideal implementation. In this section, we highlight a few variations that explain patterns in the tables.
C.1. Explanation of patterns in Table 2
In 2002, there was an issue with the preloads, which prevented households from entering section U if they had reported not owning a particular asset in 2000. This explains the lower percentage for this year. In 2004–2008, the questionnaire software did not show thousands separators in the amounts, which may have caused some decimal place errors, which in turn would trigger asset verification questions. In later versions of the 2010 questionnaire and in subsequent waves, the construction of the preloads for section U was adapted to take into account the previous wave’s section U corrected values. This should have led to fewer discrepancies. The larger absolute number in 2012 is probably due to the larger eligible sample: this is the second wave for the Middle Baby Boomers cohort (age 51–56 in 2010), which could not enter section U in 2010. A potential explanation for the higher fraction of households who entered in 2012 is that in some cases incorrect preloads were used in 2012.
C.2. Explanation of patterns in Table 3
Corrections of the previous wave were more common, except in 2002 (and to a much lesser extent in 2004). The latter may be explained by the preload issue mentioned above, because a relatively large fraction of previous wave corrections are corrections from no ownership to a positive value, and these cases did not enter section U in 2002. The relative improvement of the previous wave in 2010 may be due to replacing the original answers by their previous-wave section U corrections (if applicable) in the later versions of the 2010 questionnaire. The high fraction incorrect, in particular previous wave incorrect, in 2012, is likely due to the preload problems in 2012 mentioned above.
Footnotes
The HRS (Health and Retirement Study) is sponsored by the National Institute on Aging (grant number NIA U01AG009740) and is conducted by the University of Michigan: http://hrsonline.isr.umich.edu/.
The RAND HRS Longitudinal Data file is an easy to use data set based on the HRS data. It was developed at RAND with funding from the National Institute on Aging and the Social Security Administration www.rand.org/labor/aging/dataprod/hrs-data.html.
Later versions of these data contain some corrections and improvements, but we do not expect these to affect the comparisons reported here.
In waves 6–8 (2002–2006), this threshold was $1,000,000 for a random 10% of the sample.
This carries over to so-called proxy interviews, which are conducted with an informant other than the intended respondent: the proxy respondent needs to be the same person as in the previous wave.
The more comprehensive measure HwATOTB also includes the second home and any loans on it. However, the value of the second home and any loans on it cannot be consistently defined for all waves (1992–2012) and therefore we use HwATOTA in this paper.
Alternatively, we can repeatedly draw from an untruncated normal distribution until the imputation satisfies the bound (acceptance-rejection sampling). For historical reasons, we do this first and only switch to the noniterative formula after 100 rejected draws.
References
- [1].Wansbeek T, Meijer E. Measurement error and latent variables in econometrics. Amsterdam: North-Holland; 2000. [Google Scholar]
- [2].Little RJA, Rubin DB. Statistical analysis with missing data. 2nd ed. New York: Wiley; 2002. [Google Scholar]
- [3].Hurd M, Juster FT, Smith JP. Enhancing the quality of data on income: Recent innovations from the HRS. J. Human Res 2003; 38: 758–72. doi: 10.3368/jhr.XXXVIII.3.758. [DOI] [Google Scholar]
- [4].Juster FT, Smith JP, Stafford F. The measurement and structure of household wealth. Labour Econ. 1999; 6: 253–75. doi: 10.1016/S0927-5371(99)00012-3. [DOI] [Google Scholar]
- [5].Juster FT, Suzman R. An overview of the Health and Retirement Study. J. Human Res 1995; 30: S7–56. doi: 10.2307/146277. [DOI] [Google Scholar]
- [6].National Institute on Aging. Growing older in America: The Health and Retirement Study (NIH Publication No. 07–5757). Bethesda (MD): National Institute on Aging; 2007. http://hrsonline.isr.umich.edu/sitedocs/databook-2006/. [Google Scholar]
- [7].Hill DH. Wealth dynamics: Reducing noise in panel data. J. Appl. Econometrics 2006; 21: 845–60. doi: 10.1002/jae.878. [DOI] [Google Scholar]
- [8].Venti S Economic measurement in the Health and Retirement Study. Forum Health Econ. Pol 2011; 14(3): Article 2. doi: 10.2202/1558-9544.1273. [DOI] [Google Scholar]
- [9].Lee J, Meijer E, Phillips D. The effect of using different imputation methods for economic variables in aging surveys. CESR-Schaeffer Working Paper No. 2015–019. Los Angeles, CA: University of Southern California, Center for Economic and Social Research; 2015. doi: 10.2139/ssrn.2650214. [DOI] [Google Scholar]
- [10].Chien S, et al. RAND HRS data documentation, Version N. Santa Monica (CA): RAND Corporation; 2014. http://www.rand.org/content/dam/rand/www/external/labor/aging/dataprod/randhrsN.pdf. [Google Scholar]
- [11].Moldoff M, et al. RAND HRS income and wealth imputations, Version N. Santa Monica (CA): RAND Corporation; 2014. http://www.rand.org/content/dam/rand/www/external/labor/aging/dataprod/randiwn.pdf. [Google Scholar]
- [12].SAS Institute. SAS/STAT® 9.2 User’s Guide. 2nd ed. Cary (NC): SAS Institute; 2009. http://support.sas.com/documentation/cdl/en/statug/63033/PDF/default/statug.pdf [Google Scholar]
- [13].Zissimopoulos JM. Marriage and wealth changes at older ages. In: Couch KA, Daly MC, Zissimopoulos JM, eds. Lifecycle events and their consequences: job loss, family change, and declines in health. Redwood City (CA): Stanford University Press; 2013. pp. 158–77. [Google Scholar]
- [14].Little RJA. Missing-data adjustments in large surveys. J. Bus. Econ. Statistics 1988; 6: 287–296. doi: 10.1080/07350015.1988.10509663. [DOI] [Google Scholar]
- [15].Hurd MD, Meijer E, Moldoff M, Rohwedder S. Improved wealth measures in the Health and Retirement Study: Asset reconciliation and cross-wave imputation. Working Paper No. WR-1150. Santa Monica, CA: RAND Labor & Population; 2016. doi: 10.7249/WR1150. [DOI] [Google Scholar]
- [16].HRS 2010 – Section Q: Assets And Income (Final Version – 1/10/2011). http://hrsonline.isr.umich.edu/modules/meta/2010/core/qnaire/online/17hr10Q.pdf
- [17].HRS 2010 – Section U: Asset Verification (Final Version 2 – 6/08/2011). http://hrsonline.isr.umich.edu/modules/meta/2010/core/qnaire/online/21hr10U.pdf.
