Abstract
Measuring change over time in areas such as family structure, employment, income, and poverty is of great interest to social scientists. The panel component of the Current Population Survey (CPS) affords the opportunity to observe short-term change in these areas. The Annual Social and Economic supplement (ASEC), with its wealth of information on income, health insurance coverage, benefits receipt, and many other topics, is a particularly popular resource for this purpose. However, commonly used methods for linking CPS ASEC files do not address how to link the ASEC oversample records across years, leading to smaller linked sample sizes. We demonstrate how to recover the linkable oversample cases in the 2005–2020 ASEC, resulting in about 150,000 more linked records (between 13,000 and 19,000 yearly) which represents a 30% increase in the overall linked sample size.
Keywords: Measurement, Current population survey
Introduction
The Current Population Survey (CPS) and its many supplements, particularly the Annual Social and Economic Supplement (ASEC), are vitally important for demographic research in the US. The rotation pattern of the CPS, in which a household is interviewed four consecutive months a year and then again those same months the following year, has allowed researchers to construct 2-year panels and exploit the longitudinal aspect of the survey. Indeed, while the ability to link adjacent ASEC files has been documented and used for years (Feng, 2001; Katz et al., 1984; Madrian & Lefgren, 2000; Pitts, 1988), this feature of the data set has been growing in popularity because of recent innovations in linking (Drew et al. 2014; Flood & Pacas, 2017; Flood et al., 2020). Taking advantage of the CPS rotation pattern has allowed researchers to use the ASEC as a 2-year panel and study a wide variety of issues such as income volatility, the impact of education on employment, changes in the social safety net, and poverty transitions (Ziliak et al., 2011; Riddell & Song, 2011; Hardy et al., 2018; Pacas, 2017; Pacas & Davis, 2018). Given the importance of the ASEC as a source of official statistics and its wide topical coverage, there is untapped potential for research using 2-year ASEC panels. In this note, we call attention to a feature of the ASEC that has been largely overlooked when using this data set as a panel: two distinct oversamples, the Hispanic oversample and the State Children’s Health Insurance Program (SCHIP)-eligible oversample, that were introduced by the Census Bureau to improve the reliability of subpopulation estimates.
Currently popular CPS ASEC linking methodologies do not properly account for an idiosyncrasy of these records, resulting in the exclusion of the vast majority of the ASEC oversample respondents from linked files. We show that the oversamples are linkable and demonstrate how to recover the linkable oversample cases in the 2005–2020 ASEC, resulting in about 150,000 more linked records (between 13,000 and 19,000 yearly) which represents a 30% increase in the overall linked sample size. In addition to yielding larger linked sample sizes, including ASEC oversample records when linking will likely result in more accurate population estimates, especially for subpopulations that are targeted by oversampling. Note that the reference period for CPS ASEC is the year prior to that in which the survey is conducted. For example, data collected about the calendar year 2004 is collected in the survey year 2005. Throughout this research note, we refer to survey years, and so the 2005–2020 ASEC data described below pertain to calendar years 2004–2019.
Background
Linking Methodologies
Linking the CPS relies on an understanding of the CPS rotation pattern. The CPS is a monthly survey of roughly 73,000 households. These households are interviewed over a period of 16 months in a 4-8-4 rotation pattern; a household is interviewed for four consecutive months, rotates out of the sample for 8 months, and is then interviewed again for four consecutive months. A household’s place in the CPS rotation pattern is indicated by a “month-in-sample” (MIS) variable—which takes on values one through eight. Given the CPS rotating panel structure, it is commonly understood that a household is interviewed, at most, eight times over a period of 2 years. This section highlights how ASEC oversamples are excluded from linking methodologies because of the validation criteria that a household follows a sequential pattern of MIS values.
Census Bureau documentation for linking CPS files is sparse. CPS panels have been created by independent researchers and various efforts have been made to lower the barrier to creating panels from the CPS. These methodologies include linking the ASEC in two ways: (1) as part of full CPS panels that link all basic monthly samples and (2) linking adjacent ASECs directly. The differences between the two methodologies are subtle but the similarity between them highlights how both exclude all ASEC oversample respondents.
One set of linking methodologies aims to create full panels of all CPS responses (see Nekarda, 2009; Rivera Drew et al., 2014). The approach is to link as many BMS responses for an individual as possible. As demonstrated in Flood and Pacas (2017), the ASEC can be used as a part of these panels. Specifically, as long as a respondent’s set of BMS responses include the month of March, then it is possible to link their ASEC responses to the March BMS (see Fig. 1 for a visual representation of this linking setup). Importantly, because the central purpose of these methodologies is to create full panels from the BMS responses, the ASEC oversamples are necessarily excluded from the March BMS to ASEC link.1 In sum, linking methodologies that focus on linking BMS responses allow for linking ASECs across years but only for March BMS responses and therefore exclude ASEC oversample responses by design.
Fig. 1.
ASEC linking algorithms typically exclude oversamples because of MIS assignment
The second set of methodologies explicitly try to link adjacent ASEC samples. The most popular methodology is Madrian and Lefgren (2000), which has been used for many different analyses (see Ziliak et al., 2011; Riddell & Song, 2011; Liu & Trefler, 2011; Humensky et al., 2013; Hokayem & Heggeness, 2014; Elsby et al., 2016; Hardy et al., 2018).2 The methodology used by Madrian and Lefgren (2000) relies on the explicit instructions outlined in Census Bureau documentation which provides the linking keys needed for linking ASECs. The key instruction given by the Census Bureau that leads to the exclusion of all ASEC oversample respondents is the following:
The first step in matching year t with year t+1 is to select from year-t those housing units with a ‘month in sample’ value of 1 through 4, and from year t+1 those units with a ‘month in sample’ value of 5 through 8. This will identify the sample subset eligible for matching. Within this subset, housing units in year t, month 1 will match only with units in year t+1, month 5, etc. (U.S. Census Bureau, 2020)
These instructions implicitly ignore the MIS assignment for ASEC oversample respondents and methodologies based on these instructions will not link these respondents.
The ASEC and Its Oversamples
The ASEC is conducted primarily in March of each year though ASEC interviewing also occurs in February and April.3 In addition to the demographic and labor force information contained in the basic monthly survey (BMS), the ASEC includes information on health insurance coverage, income, noncash benefits, and poverty. Because of its importance in government statistics and for socio-demographic research, the ASEC sample has been increased over the years. Today, the ASEC file contains all of the March BMS respondents plus two oversamples. Figure 1 outlines the specific sampling methodology used for selecting the ASEC oversample and their potential links across years. The Hispanic oversample was first introduced in 1976 to improve the reliability of estimates for this subpopulation and contains roughly 2500 households (about 5000 individuals) each year (Flood & Pacas, 2017). These households are selected from those households who received the November BMS and had at least one Hispanic person in the household. Importantly, these households would have not been part of the March BMS. As a result, the Census Bureau will field the ASEC in February or March as a completely separate interview and therefore results in Hispanic households being interviewed a ninth and potentially tenth time.
The second oversample, known as the SCHIP oversample, was first introduced in 2001 to improve estimates of children without health insurance and contains about 12,000 households (about 24,000 individuals) (Flood & Pacas, 2017). The SCHIP oversample households include non-Hispanic non-Whites or non-Hispanic Whites with children 18 years old or younger. These households are chosen from various different BMS and receive a separate ninth or tenth interview in February, March, or April.
Importantly, the oversamples are constructed in such a way that a subset of the oversample records can also be linked across years because these groups respond to the ASEC 2 years in a row. As shown in Fig. 1, Hispanic oversample respondents are sampled from all MIS groups from the November preceding the ASEC survey; SCHIP oversample households are drawn from MIS group 8 in August, September, and October of the preceding year and from MIS groups 4 and 8 in February and 1 and 5 in April of the ASEC survey year. Half of the Hispanic oversample respondents and 2/7 of the SCHIP oversample respondents (i.e. 2 MIS groups link forward out of a total of 7 MIS groups selected for the SCHIP oversample) from 2005 onward (Flood & Pacas, 2017) can be linked across years. The SCHIP oversamples drawn from MIS 8 in August, September, and October will never be eligible to link across years. Because of the 4-8-4 rotation pattern, these households are no longer part of the CPS after MIS 8 and therefore respond to the ASEC only once. Many ASEC linking methodologies overlook that the ASEC oversample respondents consist of households that have been interviewed a ninth and tenth time by the CPS and have been assigned MIS values between one and eight. In other words, even though oversample households are being interviewed for a ninth or tenth time, MIS values of 9 or 10 are never found in the CPS public-use data. In processing the ASEC, the Census Bureau assigns all oversample households a month-in-sample value between one and eight (Flood & Pacas, 2017) such that MIS assignments are evenly distributed across all 8 MIS groups. The oversample MIS values are irrelevant for linking purposes.
Methodology
For data from 2002 and onward, the Census Bureau instructs users to use household and person identifiers only to link ASEC files across years (U.S. Census Bureau, 2020). Beginning in 2005, the only linking key required to link ASEC files across years is the CPS variable PERIDNUM (U.S. Census Bureau, 2020). Using PERIDNUM as the sole linking key, we link all ASECs from 2005 through 2020. As properly linking all components of the ASEC files prior to 2005 is much more complicated than in 2005 and after, we restrict ourselves to the later period for the purposes of demonstrating the importance of taking the ASEC oversamples into account when linking. Moreover, MIS values in the original 2016, 2018, and 2020 ASEC files must be addressed to properly compare the outcomes of linking using only the Census Bureau-recommended linking keys and a method that incorporates monthin-sample. In 2016, 2018, and 2020, the original month-in-sample values for the March BMS respondents do not follow the expected rotation pattern. In these years, March BMS households actually in months-in-sample one through four have monthin-sample values of five through eight and households actually in months-in-sample five through eight have month-in-sample values of one through four. The correction to this is to recode MIS 5–1, MIS 6–2, MIS 7–3, MIS 8–4, MIS 1–5, MIS 2–6, MIS 3–7, and MIS 4–8.4
Importantly, the key insight of our methodology (the Pacas & Rodgers method) is that MIS is not used in any capacity for linking ASEC oversamples. In order to identify ASEC oversample respondents, researchers must first link the March BMS respondents to the ASEC. The ASEC respondents that do not link in this step are the ASEC oversample respondents.5 This is contrary to what the Census Bureau suggests in their documentation but our research shows that these MIS criteria do not apply to ASEC oversamples. After achieving naive links using PERIDNUM, we validate those links based on age, sex, and race. The validation criteria are adopted from Madrian and Lefgren (2000). Validation on sex and race require that responses be identical across years. Age values in the first year must be between −1 and 3 years in the following year.
Our methodology of using PERIDNUM and, more importantly, relaxing the MIS validation criteria, recovers thousands of validated responses each year and clearly demonstrates the problems of using MIS to link ASECs. Table 1 shows month-insample values for March BMS records from the 2005 and 2006 ASEC files linked using identifiers only. The rotation pattern is clearly visible. The MIS values for households in 2005 range from 1 to 4 and link to 2006 MIS values that range from 5 to 8. As required by the rotation pattern, the MIS values in 2006 should be exactly the 2005 MIS value plus 4. Table 1 clearly shows how this holds true for all March BMS records.
Table 1.
Month-in-sample for linked March BMS records in ASEC, 2005–2006
2006 MIS values | |||||
---|---|---|---|---|---|
5 | 6 | 7 | 8 | ||
2005 MIS values | 1 | 12,476 | 0 | 0 | 0 |
2 | 0 | 12,667 | 0 | 0 | |
3 | 0 | 0 | 12,903 | 0 | |
4 | 0 | 0 | 0 | 12,858 |
Unlike for the March BMS links, the month-in-sample values do not reflect the rotation pattern for oversample links. Table 2 shows MIS values for linked Hispanic and SCHIP oversample records between the 2005 and 2006 ASEC files; note that most successful links do not follow the expected MIS progression. Given that these records match on PERIDNUM and demographic characteristics, Table 2 confirms both that the assignment of MIS to ASEC oversample respondents does not maintain the expected rotation pattern and that, in spite of this, ASEC oversample records can be linked across years. As our subsequent analysis shows, employing our methodology to include oversamples in linked ASEC data leads to a dramatic increase in the size of the linked sample.
Table 2.
Month-in-sample values for linked oversample records in the ASEC file, 2005–2006
2006 MIS values | |||||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ||
2005 MIS values | 1 | 275 | 234 | 161 | 190 | 146 | 199 | 218 | 245 |
2 | 255 | 215 | 224 | 220 | 206 | 143 | 210 | 212 | |
3 | 243 | 243 | 231 | 158 | 230 | 185 | 173 | 245 | |
4 | 161 | 206 | 287 | 207 | 233 | 162 | 179 | 245 | |
5 | 1526 | 193 | 221 | 180 | 145 | 194 | 269 | 208 | |
6 | 193 | 1241 | 172 | 222 | 197 | 184 | 136 | 254 | |
7 | 210 | 196 | 1208 | 254 | 270 | 167 | 225 | 144 | |
8 | 199 | 207 | 231 | 1245 | 183 | 244 | 240 | 191 |
These links are validated on age, sex, and race such that only those whose sex and race are identical between years and whose age in 2006 is between −1 and 3 years larger than in 2005. 96.5% of oversample links are retained after validation
Note that while it is possible to link March BMS records from the ASEC file to other CPS basic monthly files, such as the December BMS file that contains the Food Security Supplement, ASEC oversample records cannot be linked to any of the basic monthly files in which they appear. This unlinkability is the result of Census Bureau procedures to protect the identity of the oversample households and cannot be overcome by linking methodology.
Illustration
By ignoring MIS when linking ASEC oversamples between 2005 and 2020, we recover between 13,000 and 19,000 linked oversample records for each pair of linked files between 2005 and 2020. Figure 2 shows unweighted counts of links across ASEC files broken down by oversample type. For comparison, we implement the Madrian & Lefgren methodology for the same time period. As Table 2 shows (cells highlighted), there are some cases where the MIS pattern does hold and therefore does link and validate across years. However, the increase in sample size using our methodology is substantial.
Fig. 2.
Validated oversample links by type, Madrian & Lefgren v. Pacas & Rodgers, 2005–2020
Table 3 demonstrates that the oversample records in the linked ASEC files are demographically different than those from the March BMS. One obvious implication of this difference is potential increased precision of estimates based on the linked sample when the oversample records are properly included. This increased precision is likely to be most pronounced for estimates of subpopulations targeted by the oversampling, but since these subpopulations are overrepresented among low-income households, the impact on overall precision is greater than it would be if the core ASEC sample were increased by the same number of households. This increased sample size will not compensate for non-random attrition that may result from the fact that the CPS is a household survey that re-visits dwellings and does not follow individuals if they move.
Table 3.
Comparison of demographic characteristics of linked march BMS and ASEC oversample records, 2005–2020
March BMS | ASEC oversamples | Difference* | |
---|---|---|---|
Sex (%) | |||
Female | 51.9 | 51.4 | −0.5 |
Male | 48.1 | 48.6 | 0.5 |
Age (%) | |||
Less than 18 | 22.5 | 35.2 | 12.8 |
18–64 | 60.8 | 58.3 | −2.5 |
65 + | 16.8 | 6.5 | −10.3 |
Race (%) | |||
White Non-Hispanic | 71.5 | 45.4 | −26.1 |
Black Non-Hispanic | 8.9 | 10.8 | 1.9 |
Other single race Non-Hispanic | 5.5 | 7.1 | 1.6 |
Other mutiple race Non-Hispanic | 1.8 | 2.3 | 0.5 |
Hispanic (any race) | 12.2 | 34.4 | 22.1 |
Percent In Poverty | 10.4 | 13.4 | 3.1 |
Differences are statistically different from zero at the 95 percent confidence level
Conclusion
Properly including the Hispanic and SCHIP oversample records when linking ASEC files across years results in larger linked sample sizes and will likely result in more accurate population estimates. In this note, we demonstrate a methodology for properly including the ASEC oversample and highlight how prior methodologies have overlooked this component of the ASEC. We bring attention to the fact that the ASEC oversample respondents are demographically different than the March BMS respondents. Importantly, we do not directly address the issue of weighting with linked CPS ASEC samples. Our paper is narrowly focused on the increased sample size rather than how to make the particular sample representative of the population. Certainly the choice of weights is extremely important and can be done by various techniques (i.e. raking, inverse probability weights) but ultimately the weighting decisions depends on a variety of factors: attrition across waves, validation criteria chosen by researchers that leads to fewer/more sample being included, the subsamples of interests and the particular analysis question. Our methodology allows researchers to fully utilize the ASEC and provides further documentation on an important aspect of using CPS panels.
Acknowledgements
The authors would like to acknowledge the helpful comments of Joe Ritter and Elizabeth Wrigley-Field. Funding was provided by the Minnesota Population Center through a grant (P2C HD041023) from the Eunice Kennedy Shriver National Institute for Child Health and Human Development NICHD.
Footnotes
Although the oversample respondents have non-March BMS responses, the Census Bureau does not provide identifiers to link the oversample responses to their BMS responses.
Feng 2001 also creates ASEC panels using a Bayesian approach for validating links. However, the methodology cannot be extended after 2002 and so is not covered here.
ASECs being administered in February or April likely have no consequential data collection issues since the ASEC primarily asks questions about the prior calendar year. The most important difference between March and February/April ASEC respondents is the sampling methodology and demographics of the respondents.
See IPUMS CPS description (https://cps.ipums.org/cps-action/variables/mish) of MISH for more details. This particularity has no effect on cross-sectional estimates and is only necessary to be corrected for linking purposes.
This process is documented in Flood & Pacas (2017) and available via IPUMS variable ASECOVERP (https://cps.ipums.org/cps-action/variables/ASECOVERP).
Data Availability
All data utilized in this analysis (Flood et al., 2020) are publicly accessible through IPUMS.org: https://cps.ipums.org/cps/.
References
- Elsby MWL, Shin D, & Solon G (2016). Wage adjustment in the great recession and other downturns: Evidence from the United States and Great Britain. Journal of Labor Economics, 34(S1), S249–S291. 10.1086/682407 [DOI] [Google Scholar]
- Feng S (2001). The longitudinal matching of current population surveys: A proposed algorithm. Journal of Economic and Social Measurement, 27(1–2), 71–91. 10.3233/JEM-2003-0197 [DOI] [Google Scholar]
- Flood S, King M, Rodgers R, Ruggles S, & Robert Warren J (2020). Integrated public use microdata series, current population survey: Version 8.0. Minneapolis University of Minnesota. 10.18128/D030.V8.0 [DOI] [Google Scholar]
- Flood SM, & Pacas JD (2017). Using the annual social and economic supplement as part of a current population survey panel. Journal of Economic and Social Measurement, 42(3–4), 225–248. 10.3233/JEM-180447 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flood SM, Rodgers R, Pacas JD, Kristiansen D, & Klass B (2021). Extending current population survey linkages: Obstacles and solutions for linking monthly data from 1976 to 1988, version 2. Journal of Economic and Social Measurement. 10.18128/IPUMS2020-02.v2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hardy B, Smeeding T, & Ziliak JP (2018). The changing safety net for low-income parents and their children: Structural or cyclical changes in income support policy? Demography, 55(1), 189–221. 10.1007/s13524-017-0642-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hokayem C, & Heggeness M (2014). Factors Influencing Transitions Into and Out of Near Poverty: 2004–2012. (SEHSD Working Paper 2014–05)
- Humensky JL, Jordan N, Stroupe KT, & Hynes DM (2013). How are Iraq/Afghanistan-Era veterans faring in the labor market? Armed Forces & Society, 39(1), 158–183. 10.1177/0095327X12449433 [DOI] [Google Scholar]
- Katz A, Teuter K, & Sidel P (1984). Comparison of alternative ways of deriving panel data from the annual demographic files of the current population survey. Review of Public Data Use, 12, 35–44. [Google Scholar]
- Liu R, & Trefler D (2011). A Sorted Tale of Globalization: White Collar Jobs and the Rise of Service Offshoring. National Bureau of Economic Research. https://www.nber.org/system/files/working_papers/w17559/w17559.pdf [Google Scholar]
- Madrian BC, & Lefgren LJ (2000). An approach to longitudinally matching current population survey (CPS) respondents. Journal of Economic and Social Measurement, 26(1), 31–62. 10.3233/JEM-2000-0165 [DOI] [Google Scholar]
- Nekarda CJ (2009). A longitudinal analysis of the current population survey: Assessing the cyclical bias of geographic mobility. Federal Reserve Board of Governors [Google Scholar]
- Pacas Viscarra J (2017). Innovative Methods for Using Census Data to Study Poverty, Labor Markets, and Policy. Retrieved from the University of Minnesota Digital Conservancy, https://hdl.handle.net/11299/191475 [Google Scholar]
- Pacas J, & Davis E (2018). Moving into and out of rural poverty. IRP Focus, 34(3), 4–12. [Google Scholar]
- Pitts A (1988). Matching Adjacent Years of the Current Population Survey. Unicon Corporation [Google Scholar]
- Riddell WC, & Song X (2011). The impact of education on unemployment incidence and re-employment success: Evidence from the U.S. labour market. Labour Economics, 18(4), 453–463. 10.1016/j.labeco.2011.01.003 [DOI] [Google Scholar]
- Rivera Drew JA, Flood S, & Warren JR (2014). Making full use of the longitudinal design of the current population survey: methods for linking records across 16 months. Journal of Economic and Social Measurement, 39(3), 121–144. 10.3233/JEM-140388 [DOI] [PMC free article] [PubMed] [Google Scholar]
- U.S. Census Bureau. (2020). Current Population Survey, 2020 Annual Social and Economic (ASEC) Supplement
- Ziliak JP, Hardy B, & Bollinger C (2011). Earnings volatility in America: Evidence from matched CPS. Labour Economics, 18(6), 742–754. 10.1016/j.labeco.2011.06.015 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All data utilized in this analysis (Flood et al., 2020) are publicly accessible through IPUMS.org: https://cps.ipums.org/cps/.