Interpolating U.S. Decennial Census Tract Data from as Early as 1970 to 2010: A Longtitudinal Tract Database

John R Logan; Zengwang Xu; Brian Stults

doi:10.1080/00330124.2014.905156

. Author manuscript; available in PMC: 2014 Aug 17.

Published in final edited form as: Prof Geogr. 2014 May 13;66(3):412–420. doi: 10.1080/00330124.2014.905156

Interpolating U.S. Decennial Census Tract Data from as Early as 1970 to 2010: A Longtitudinal Tract Database

John R Logan ¹, Zengwang Xu ², Brian Stults ³

PMCID: PMC4134912 NIHMSID: NIHMS522336 PMID: 25140068

Abstract

Differences in the reporting units of data from diverse sources and changes in units over time are common obstacles to analysis of areal data. We compare common approaches to this problem in the context of changes over time in the boundaries of U.S. census tracts. In every decennial census many tracts are split, consolidated, or changed in other ways from the previous boundaries to reflect population growth or decline. We examine two interpolation methods to create a bridge between years, one that relies only on areal weighting and another that also introduces population weights. Results demonstrate that these approaches produce substantially different estimates for variables that involve population counts, but they have a high degree of convergence for variables defined as rates or averages. Finally the paper describes the Longitudinal Tract Data Base (LTDB), through which we are making available public-use tools to implement these methods to create estimates within 2010 tract boundaries for any tract-level data (from the census or other sources) that are available for prior years as early as 1970.

Keywords: 2010 Census, census geography, census tract, areal interpolation, population interpolation

A common situation faced by researchers using areal data is discrepancies in the boundaries of reporting units. For example, population data may be reported in census tracts, while crime data may be reported in police precincts, or election data in voting districts, or school data in school attendance zones. Another example is when there are changes over time in the boundaries of the same units (Martin, Dorling and Mitchell 2002). In either case the general problem is how to harmonize data to the same geographic unit so that information from different sources and times can be analyzed together. Social scientists sometimes avoid the issue of boundary changes by simply comparing the cross-sectional pattern of results in one year with another year. This is not possible where the purpose is to study the changes in the characteristics of specific places (however these are defined); shifting boundaries introduce greater likelihood of drawing the wrong conclusions. At the least one would want to know what changes are due to new boundaries and what changes have occurred within the places as previously bounded, and to be able to identify tracts where the estimates of change are susceptible to greater error.

Dealing with boundary changes using interpolation methods

We deal here with boundary changes, although similar principles should apply to interpolation of data from different sources. To transfer data from a source to a target zonal system, sophisticated areal interpolation methods usually use ancillary information or statistical methods to refine the source data to a more detailed or finer spatial scale and then re-aggregate these data to the target zones. Surface modeling techniques interpolate the source data into an underlying smooth surface that can then be aggregated to target zones (Bracken and Martin 1995). The surface can be estimated by point-based interpolation methods using centroids as the representatives of zones (Martin 1989, Bracken and Martin 1989) or other statistical methods (Kyriakidis 2004, Kyriakidis and Yoo 2005). Data from the surface can be aggregated to any desired areal unit. A criticism of this approach is that population characteristics are not likely to fit a smooth surface. It is common to find discrete boundaries, such as certain major streets or non-residential zones, where a population variable is discontinuous. In the evolution of minority neighborhoods in the U.S., for instance, observed processes of invasion and succession often were associated with specific locations, whose street boundaries were well known but tended to expand over time.

Another current approach is to apply dasymetric (or intelligent) interpolation methods (Maantay, Maroko and Herrmann 2007, Mennis 2003, Wright 1936, Zandbergen and Ignizio 2010, Tapp 2010, Sleeter and Gould 2007, Reibel and Agrawal 2007). The idea is that simple areal interpolation (Goodchild and Lam 1980) can be improved by using other sources of data about the distribution of the population in the source zone. One type of ancillary data is land use information from remote sensing that can identify areas with no population (Eicher and Brewer 2001). Xie (1995) and Reibel and Bufalino (2005) use information about the road network as indirect indicators of population density. Gregory and Ell (2005) discuss the use of parish population records as the ancillary data for historical interpolation in Britain. Ancillary data can also be zero dimensional point data. Zhang and Qiu (2011) use schools to estimate a density surface as ancillary data in areal interpolation of population from census tracts to postal zones in Texas. More generally Goodchild, Anselin and Deichmann (1993) suggest the use of “control zones,” areas that are known based on external information to be internally homogeneous on the attribute in question, to improve areal interpolation.

Applications of interpolation to boundary changes in U.S. census tracts

Prior to every United States census it is the prerogative of state and local officials to identify small areas for which they wish to receive census population totals for electoral redistricting purposes and for other planning and policy functions. As a result, the fundamental units (census blocks and tracts) defined in the previous census could be split or consolidated, and their boundaries could be altered in complex ways. We use areal interpolation to estimate population characteristics of U.S. census tracts from prior years within 2010 boundaries. Depending on what information is available at a very local scale, the interpolation is based on a combination of area and population weighting (2000) or only on area weighting (1970-1990). The former approach is the current standard in the field, but where appropriate small unit population data are not available, areal interpolation is the fallback option (as in Gregory's [2002] harmonization of 19^th century British data to contemporary boundaries). Additionally we take advantage of ancillary data, using a water layer to identify locations with no land area (and therefore no population).

The following sections offer an overview of boundary changes between census years in the United States, outline two approaches to interpolating data in order to adjust for these changes and compare it to the methods used in the commercially available Neighborhood Change Data Base (NCDB, Tatian 2003) for 1990-2000, assess the differences in estimates from these different approaches, and introduce our own Longitudinal Tract Data Base tool (LTDB) for researchers who work with census data.

We begin by describing the changes in census geography that need to be considered in any intercensal bridging system. Examples are demonstrated by tract boundary change between 2000 and 2010. There are three main categories of changes: consolidations, splits, and complex changes. These are illustrated in Figure 1 for several tracts in the Kansas City metropolis. Consolidation creates no difficulties for analysis; in this example data for three tracts in 2000 can simply be combined into a single tract as defined in 2010. A split adds difficulty. In this example, some rationale is needed to allocate data from one tract (053106) into three new tracts formed within it.

Three types of boundary changes in the Kansas City metropolis from 2000 (in black) to 2010 (in red)

More complex changes are shown in the right-hand panel of Figure 1. First, the western and southern boundaries of tract 013401 have been adjusted, which means that some population needs to be exchanged between adjacent tracts. Note that some of these changes appear to be very small, and these likely reflect routine technical improvements in the GIS file. The Census Bureau makes many minor corrections to the digitizing of tract boundaries between decennial census years. But the section removed from this tract's southwest corner could be more significant. In addition, what used to be two tracts to the east of 013301 have been reorganized into three, retaining the outer boundaries of the original two but entirely disregarding the prior boundary between them. Nationally for 1990-2000, Tatian (2003) reports that about 80 percent of tract boundary changes were of this latter type (which he describes as “many to many” changes), while splits were most of the remaining cases. Our own estimate (see below) is that these two types were about equally prevalent in 2000-2010, if we remove tiny boundary shifts from consideration.

The distribution of tract changes is reviewed in detail for 1990-2000 and 2000-2010 in Table 1. The table shows the number of tracts that did not change, tracts that were consolidated from many tracts to one, tracts that were split from one into more than one, and complex types of change that involved multiple tracts in both years. For the purpose of this table we treat as “no change” those cases where the difference in boundaries between a tract in year 1 and year 2 involves less than 1 percent of the land area of the year 2 tract. Of the 72,739 tracts with land area in 2010, we classify 50,062 tracts as unchanged in 2010, though about one-third of these (17,898) experienced slight boundary corrections. Of tracts with changes, only a small number of cases (less than 1000) are consolidations, which pose no problem for interpolation. The most common types of change are those where some form of estimation is required. In over 17 percent of cases a single tract in 2000 was split into more than one tract in 2010, and most of these were the result of “1 to 2” splits. A nearly equal number of 2010 tracts fall into the “many to many” category, where multiple tracts in 2000 were reconfigured to produce a different set of tracts in 2010. The distribution is similar in 1990-2000.

Table 1.

Census tract boundaries over time: number of tracts experiencing various types of changes between 1990-2000 and 2000-2010

Type of Change	From 1990 to 2000		From 2000 to 2010
No change	43,507	66.6%	50,062	68.8%
Many to one	969	1.5%	999	1.4%
One to two	5,962	9.1%	9,288	12.8%
One to three	1,722	2.6%	2,013	2.8%
One to four +	1,005	1.5%	1,267	1.7%
Many to many	12,144	18.6%	9,110	12.5%
Total	65,309	100.0%	72,739	100.0%

Open in a new tab

We illustrate the extent and location of these changes in Figure 2, which presents an overlay of 2000 and 2010 tract boundaries in the Kansas City metropolis. There were a number of consolidations particularly in central city areas of Kansas City Missouri that were losing population, and a larger number of splits located mainly in outer suburban areas.

Overlay of tract boundaries in 2000 and 2010 in Kansas City, MO-KS MSA.

Combining areal and population interpolation

Bridging between 2000 and 2010 is greatly facilitated by the Topological Faces layer of the TIGER/Line shapefiles created by the Census Bureau (2011), which shows the intersection between blocks and tracts (and many other geographic layers) as defined in the 2000 and 2010 censuses. This file is available to be downloaded (http://www.census.gov/geo/www/tiger/tgrshp2010/documentation.html). U.S. census geography includes several nested scales, of which the most commonly used are the state, county, census tract, block group, and block. The face polygons created by the intersection of these multiple geographic boundaries are in effect the smallest possible sub-block unit in census geography; which we term a “fragment.” Each one is uniquely identified by a topological face ID (TFID), and it includes several useful attributes: total area, an indicator of whether the face polygon is water or land, and all geocodes (from block ID to state FIPS code) in both the 2000 and 2010 census. We work with the fragments from the Faces file which can be dissolved to the tract and block layers for 2000 and 2010.

The next step is to allocate reported tract level population characteristics from 2000 (such as counts by race and age) to blocks within the tract. Our LTDB bases this allocation on the block's share of the total tract population in 2000. It then estimates what share of the 2000 block population and of every population subgroup (estimated in the previous step) lies in each fragment within that block. It does this through simple areal interpolation based on the fragment's share of the block area. This estimate is refined with ancillary data provided in the Faces file that identifies water fragments with no population that should be disregarded.

It is straightforward to aggregate fragments to the 2010 census tracts. The assumption that all population characteristics have the same distribution as the total population across blocks within a tract, and across fragments within a block, is the main source of error in the estimate. It would be desirable to use additional information sources to refine the allocation of block populations to the fragments of a block that are in different 2010 tracts. This can be important because blocks are often reconfigured between censuses. NCDB used ancillary data from the streets coverage from Tiger/Line 1992 to bridge 1990 data to 2000 tract boundaries. Every 1990 block was linked to census tracts in 2000. When the block was fully within the boundaries of the 2000 tract, its 1990 population was used as the population weight. When the block was located in more than one 2000 tract, the length of streets within each fragment was used to determine what share of the block population to allocate to each tract. The assumption is that population is highly correlated with the extent of local roads, though it was not known whether there were homes on these roads. To the extent that roads indicate population, this procedure is superior to weighting block fragments by their area. NCDB created a 1990-2000 proprietary Block Weighting File (BWF) to represent what share of a given 1990 block's population should be estimated to fall within each 2000 tract. As in the LTDB these same weights were used to estimate all census variables.

Interpolation with area weights

Areal interpolation requires only that we have an accurate overlay of the tract boundaries in two years. The LTDB estimates for 1970-1990 are based on tract boundaries from the National Historic Geographic Information System (NHGIS). With these we created a tract-level equivalent of a Topological Faces relationship table for 1970-2000. The first step is to overlay the 2000 tract boundary file onto the 1990 boundary file and merge these into a single layer. For each tract that did not change between 1990 and 2000, the result is a single polygon and data record. For tracts that changed, multiple records exist in the new layer. We then merge 1990 census data with this new layer using 1990 state, county, and tract codes, and we apportion the 1990 counts to each fragment of the split tract using the area proportions as weights.

We repeat the same process for 1970 and 1980, again using the 2000 tract file as the overlay. We then use the population and area based interpolation method described previously to adjust the data from 2000 tract boundaries to 2010 tract boundaries.

NCDB used a similar approach for 1980, first linking source year tracts to 1990 blocks, and then interpolating from those blocks to 2000 tracts. NCDB used area-weighted interpolation using spatial data from Tiger/Line 1992. A less precise area weighting was used for 1970 that relied on the Census Bureau's tract correspondence file between 1970 and 1980. Every 1970 tract contributing to a 1980 tract was weighted equally. Then 1980 tracts were linked to 1990 blocks, and in a final step to 2000 tracts.

Researchers should be aware of the potential for error in interpolation that is based only on area weights. Figure 3 presents an extreme example of what can happen. Here a single tract in 2000 was split into three tracts in 2010. The block populations in 2000 show that the very large area that became tract 36045980000 was almost unpopulated. The LTDB population estimate for this tract based on area+population weighting is only 11. Yet areal interpolation alone suggests that most population in the source tract should be estimated to be in 36045980000. Note also that some populous blocks in 2000 have been divided in 2010 between two tracts. An area+population weighting would yield reasonable estimates if population within each of these is not greatly skewed to one portion of its area.

Example of a split tract in 2000-2010, showing the block populations (color-coded) in 2000 in each panel. On the left are the 2000 tract boundaries; on the right are 2010 boundaries.

Assessing alternative approaches

Social scientists have used both area- and population-weighted approaches in other similar situations. We presume that inclusion of population weighting yields improved estimates, but it would be useful to have more information on how different the estimates are from alternative methods and for what kinds of variables one would expect to find the largest discrepancies. Researchers often have to use less than optimal data, and in those cases it is helpful to understand better the amount and sources of error.

To assess differences in the results from these estimation procedures, we present a series of comparisons for 2000-2010 (comparing our combined area and population interpolation with an alternative in which we only take into account area).¹

These comparisons involve a selection of variables. Some of these are population counts: total population, non-Hispanic white population, Asian population, college graduates, and home owners. Others are rates or medians: population density, percent non-Hispanic white, percent Asian, percent college graduates, percent home owners, and median household income. It is more difficult to estimate absolute numbers (because these depend on how fully the area of a census tract has been settled) than to estimate compositional characteristics such as percentages and rates (which tend to be similar across adjacent tracts).

Table 2 provides comparisons for split tracts, many-to-many tracts, and (for reference) all tracts including those with no changes. For each variable, the table lists the mean and standard deviation in the initial year, based on the combined area/population interpolation estimates. These values are useful points of reference for evaluating in absolute terms how large the discrepancies are between the two estimation methods. The next column shows the correlation between the two estimates for a given set of tracts. Then four columns show the distribution of cases by how large the discrepancy is between the estimates, from less than 0.1 standard deviation (which we take to be a minor difference) to over 1.0 standard deviation.

Table 2.

Comparison of tract estimates between area+population and area-only interpolation

				Size of discrepancy
	Mean	SD	r	<0.1 SD	.1-.5 SD	.5-1.0 SD	>1.0 SD

*Splits: 2000-2010 (n=12,567)*
Population count	3,489	1,548	0.511	10.7%	34.3%	25.3%	29.7%
Non-Hispanic white count	2,426	1,420	0.681	18.8%	37.5%	22.5%	21.2%
Non-Hispanic Asian count	151	295	0.893	66.6%	25.5%	5.0%	2.9%
College graduate count	600	485	0.783	26.4%	44.3%	18.0%	11.3%
Homeowner count	880	479	0.634	16.3%	37.3%	23.2%	23.2%
Population density	1,689	4,722	0.926	79.2%	18.0%	1.9%	1.0%
Non-Hispanic white %	71.1	26.1	0.996	99.3%	0.4%	0.1%	0.1%
Non-Hispanic Asian %	4.1	7.1	0.987	99.2%	0.6%	0.2%	0.1%
College graduate %	26.6	16.3	0.997	99.0%	0.6%	0.2%	0.1%
Homeowner %	69.4	22.2	0.996	99.0%	0.6%	0.2%	0.2%
Median household income	$48,857	$18,736	0.999	99.1%	0.6%	0.2%	0.1%
*Many to many 2000-2010 (n=9106)*
Population count	3,601	1,819	0.788	53.1%	24.8%	10.8%	11.3%
Non-Hispanic white count	2,382	1,680	0.874	61.3%	22.3%	9.4%	7.0%
Non-Hispanic Asian count	201	494	0.918	85.1%	12.5%	1.6%	0.9%
College graduate count	609	577	0.882	65.8%	22.3%	7.4%	4.5%
Homeowner count	850	544	0.860	59.5%	22.8%	10.0%	7.7%
Population density	2,078	4,514	0.949	85.8%	11.6%	1.8%	0.8%
Non-Hispanic white %	66.7	30.4	0.993	96.9%	2.6%	0.3%	0.2%
Non-Hispanic Asian %	5.2	10.4	0.979	97.4%	2.2%	0.2%	0.2%
College graduate %	25.8	17.6	0.991	95.6%	3.4%	0.6%	0.3%
Homeowner %	64.5	24.8	0.992	94.7%	4.3%	0.6%	0.4%
Median household income	$47,500	$21,605	0.996	95.8%	3.4%	0.6%	0.2%
*All tracts 2000-2010 (n=72,739)*
Population count	3,871	1,602	0.883	78.3%	9.3%	5.8%	6.6%
Non-Hispanic white count	2,676	1,618	0.932	81.3%	9.8%	5.0%	3.9%
Non-Hispanic Asian count	164	375	0.971	92.8%	5.7%	1.0%	0.5%
College graduate count	612	556	0.947	83.4%	10.6%	3.9%	2.1%
Homeowner count	960	514	0.917	80.4%	9.6%	5.2%	4.8%
Population density	1988	4549	0.983	94.5%	4.7%	0.6%	0.3%
Non-Hispanic white %	69.4	29.6	0.998	99.5%	0.4%	0.1%	0.0%
Non-Hispanic Asian %	4.0	8.1	0.994	99.5%	0.4%	0.1%	0.0%
College graduate %	23.8	16.9	0.998	99.3%	0.6%	0.1%	0.1%
Homeowner %	66.5	22.7	0.998	99.1%	0.7%	0.1%	0.1%
Median household income	$ 45,158	$ 20,492	0.999	99.3%	0.6%	0.1%	0.0%

Open in a new tab

We notice that split tracts yield more disparate estimates than do tracts with many-to-many changes. A careful analysis of change over time should take into account which tracts had no change in boundaries, which had simple consolidations, which were split and which had many-to-many changes. The latter two types of tracts should be inspected separately for unusual patterns of change that may be an artifact of the interpolation method.

We also notice that the estimates of absolute numbers (counts) have much greater discrepancies than the estimates of rates or averages. As an example, consider the estimates of non-Hispanic white residents and non-Hispanic white percentage for split tracts. The correlations for the number of whites are .427 and .681, and less than 20 percent of cases have discrepancies of less than 0.1 standard deviation. But estimates of the white percentage have near-perfect correlations. And close to 100 percent of estimates are within 0.1 standard deviation of each other.

As expected, when all tracts are included in the comparison, the correlations are higher and discrepancies are smaller. For example the two estimates of total population are correlated at .88, white count at .93, and Asian count at .97. Hence the potential errors resulting from reliance on area-weighted interpolation are moderated by the many tracts that require no estimation.

Dissemination: The LTDB

Here we describe a new resource that we have created and made freely available for public use. The Longitudinal Tract Data Base (LTDB) provides tools that can be used by scholars who have data reported within census tracts in the period 1970-2000 (regardless of the source) and wish to estimate the same data using 2010 tract geography.

The LTDB (http://www.s4.brown.edu/us2010/Researcher/Bridging.htm) provides estimates using 2010 boundaries for a standard set of variables from 1970 through the 2006-2010 American Community Survey and Census 2010 (the 2006-2010 tract data were reported for 2010 tract boundaries). These data may meet the needs of many users. More versatile is the set of tools that allows users to input their own data. Key to this system are crosswalks for each prior year, similar to the Geographic Conversion Tables developed by Simpson (2002) and made available for public use and the proprietary Block Weighting File developed by NCDB. For every decennial year from 1970 to 2000, a crosswalk file is provided in which every row lists a 2010 tract ID, the ID of a tract in the source year that contributes to it, and the share of the source tract's population attributes that should be allocated to the 2010 tract. In cases where there is an exact correspondence between the source tract and the 2010 tract, there is only one row of data for the 2010 tract. Otherwise there are as many rows as there are contributing tracts. For completeness, the crosswalk file includes every contributing tract, regardless of how small a fraction of its population should be allocated to the 2010 tract.

Supplementary information includes the 2010 metropolitan area (formally the Core Based Statistical Area or CBSA) code, flags to identify central city, tracts in 2010,² and the 2010 population and land area of the tract. For the 2000-2010 crosswalk we provide one additional indicator that we believe will assist users of the interpolated data: whether there was a boundary change involving this tract and if so what type of change occurred between 2000 and 2010.

The LTDB offers code in Access and STATA that can be used in conjunction with the crosswalk file and an input data file prepared by the researcher. Input variable names need to be added to the code. Some variables, such as a median income, should be aggregated as a weighted average, and the user must identify the variable (such as number of households) to be used in weighting. The output file from Access or STATA lists all of the 2010 information about the tract from the crosswalk file and values of the input variables converted to 2010 boundaries.

Summary and discussion

Many 2000 Census tracts are split, consolidated, or otherwise redrawn in Census 2010, and similar changes have occurred in prior years. These changes obstruct longitudinal analysis at the tract level and require the use of estimation procedures to harmonize data over time. We have focused on two approaches to interpolation that are practical at a national scale. The simpler approach is based on area weighting; a more desirable method also takes into account the distribution of population by blocks within the source tracts. We have shown that the differences between these two estimates can be very substantial.

Some kinds of analysis are especially sensitive to the actual counts (counts of the number of people, the number of members of particular population subgroups, and the number of housing units, etc.) A prime example would be a study of population growth at the tract level or a study of a phenomenon like rate of crime or disease that uses a population count in the denominator. For such variables, the correlations between the estimates from the two interpolation methods are mostly in the range of .50-.85 in Table 2. For some counts, particularly for split tracts, the absolute value of discrepancies can be over 0.5 standard deviation for as many as half of these tracts. Of course, these results are for tracts that required interpolation. In the full data set, including the approximately 70 percent of tracts that did not change boundaries or experienced consolidation, we saw that the correlations are much higher.

On the other hand, our results for variables calculated as percentages or averages suggest that area-weighted estimates for such variables can be used with a high degree of confidence when the analysis is based on correlations. Although the absolute values of these variables may diverge somewhat from those that would be estimated with population weighting, these two sorts of estimates are so highly correlated that their relationships with other variables are indistinguishable.

Nevertheless, both types of interpolation introduce error. Although one cannot assess how close estimates from either approach come to the “real” values (which would require access to the original point or block level data), we provide an indicator of whether a tract's data have been interpolated and what kind of boundary changes are involved. Researchers may wish to check whether the same results are found when data for interpolated tracts are excluded or weighted less heavily than other cases.

The LTDB offers researchers a versatile, open-source approach to study census tract data in a longitudinal framework. For 2000-2010 the estimation methods are similar to those that have proved useful in the past, and they can be combined with input data from NCDB from 1970-1990 to update those estimates to 2010 boundaries. For some users it may be preferable to rely on the LTDB's area-only interpolation estimates for these prior years, especially for variables not included in the NCDB standard data set. For all users the supplementary information from LTDB on types of boundary changes experienced in 2000-2010 offers new methods of assessing how errors in estimation affect their research results.

To clarify the contribution made by this research, we review the options it makes available to researchers to study tract-level changes between some earlier time and 2010.

For researchers wishing to harmonize data for pre-2000 census tracts, the LTDB uses areal interpolation to create a bridge. An alternative option is to acquire the NCDB files for 1970-1990 adjusted to 2000 boundaries and then apply the LTDB to bridge these data to 2010.
There are conditions in which utilizing the NCDB in this fashion is a less satisfactory solution, two of which deserve emphasis here. Most important, NCDB does not provide linked files for all census variables, but only for a selection of variables from the sample count files.³ Some researchers will need other census variables. In addition researchers are increasingly working with information aggregated to the tract level from non-census sources, such as criminal justice, public health, and voting records. The LTDB is well suited to these needs.

These harmonized data will facilitate studies of neighborhood change, such as population growth and decline, shifts in racial and ethnic composition, home ownership, and socioeconomic status. The long time series, extending over four decades, may make possible estimation of more complex models, such as reciprocal causation or varying time lags. For researchers working with data from other countries and time periods, the interpolation methods used here may prove to be useful. The comparison of areal only and area+population interpolation may not prove to be the same in other contexts, but the more general finding – that spatial dependence of characteristics measured as rates or percentages tends to minimize errors in interpolation even when actual counts are over or under estimated – may be widely applicable.

Acknowledgments

This research was supported by the Russell Sage Foundation and Brown University through the US2010 Project.

Footnotes

We also compared our area-weighted estimates for 1990-2000 with NCDB's population+area estimates. Results are similar except that we find a much lower correlation of estimated values for median household income than in 2000-2010. Our approach with this variable was to calculate an area-weighted average of the median incomes of source tracts. There is no documentation of NCDB's method, but we find a much higher correlation if we take a simple unweighted average of medians from the source tracts.

In longitudinal research on metropolitan areas it is desirable to hold constant the boundary between the central city and suburbia. The NCDB provides the place code for the place in which the largest area of the tract is located. We base the location flag on population share for 2000, and on area share for 1970-1990. The central city variable identifies tracts located in a principal city of the CBSA in 2010.

NCDB provides sample data (Summary Files 3 and 4 in 2000, and its equivalents in prior years) even for variables that are available from full count tabulations in Summary Files 1 and 2. Not all users are aware that in the files based on sample count data the Census does not adjust population totals to match the full count information that is available at the tract level. The correlations between values reported by the Census Bureau in 2000 Summary File 1 and Summary File 3 for variables like the total population and number and share of white and Asian residents are .98 or higher. In some tracts, however, there are larger discrepancies. For example, the average Asian count was 160 with a standard deviation of 384 in SF1. In about 21 percent of tracts, the SF3 value was different from the SF1 value by more than 0.1 standard deviation (that is, more than 38).

Contributor Information

John R. Logan, Department of Sociology and Director of the Initiative on Spatial Structures in the Social Sciences at Brown University, Providence, RI 02912. john_logan@brown.edu. His research focuses on urban development in the U.S. and China, incorporation of immigrants and minorities, and spatial inequalities..

Zengwang Xu, Department of Geography, University of Wisconsin, Milwaukee, WI 53211. xuz@uwm.edu. His research integrates GIS, complex networks/systems science, and spatial and statistical analyses..

Brian Stults, College of Criminology and Criminal Justice at Florida State University, Tallahassee, FL 32306. bstults@fsu.edu. He studies the relationships between neighborhoods, race, and crime with particular attention to impacts of racial segregation..

References

Bracken I, Martin D. The generation of spatial population distributions from census centroid data. Environment and Planning A. 1989;21:537–543. doi: 10.1068/a210537. [DOI] [PubMed] [Google Scholar]
Bracken I, Martin D. Linkage of the 1981 and 1991 UK Censuses using surface modelling concepts. Environment and Planning A. 1995;27:379–390. doi: 10.1068/a270379. [DOI] [PubMed] [Google Scholar]
Eicher CL, Brewer CA. Dasymetric mapping and areal interpolation: Implementation and evaluation. Cartography and Geographic Information Science. 2001;28:125–138. [Google Scholar]
Goodchild MF, Anselin L, Deichmann U. A framework for the areal interpolation of socioeconomic data. Environment and Planning A. 1993;25:383–397. [Google Scholar]
Goodchild MF, Lam N. Areal interpolation: A variant of the traditional spatial problem. Geo-Processing. 1980;1:297–312. [Google Scholar]
Gregory IN. The accuracy of areal interpolation techniques: standardising 19th and 20th century census data to allow long-term comparisons. Computers, Environment and Urban Systems. 2002;26:293–314. [Google Scholar]
Gregory IN, Ell PS. Breaking the boundaries: Geographical approaches to integrating 200 years of the census. Journal of the Royal Statistical Society. Series A. Statistics in Society. 2005;168:419–437. [Google Scholar]
Kyriakidis PC. A geostatistical framework for area-to-point spatial interpolation. Geographical Analysis. 2004;36:259–289. [Google Scholar]
Kyriakidis PC, Yoo E-H. Geostatistical prediction and simulation of point values from areal data. Geographical Analysis. 2005;37:124–151. [Google Scholar]
Maantay JA, Maroko AR, Herrmann C. Mapping population distribution in the urban environment: The cadastral-based expert dasymetric system. CEDS. Cartography and Geographic Information Science. 2007;34:77–102. [Google Scholar]
Martin D. Mapping population data from zone centroid locations. Transactions of the Institute of British Geographers. 1989;14:90–97. [PubMed] [Google Scholar]
Martin D, Dorling D, Mitchell R. Linking censuses through time: Problems and solutions. Area. 2002;34:82–91. [Google Scholar]
Mennis J. Generating surface models of population using dasymetric mapping. The Professional Geographer. 2003;55:31–42. [Google Scholar]
Reibel M, Agrawal A. Areal interpolation of population counts using pre-classified land cover data. Population Research and Policy Review. 2007;26:619–633. [Google Scholar]
Reibel M, Bufalino ME. Steet-weighted interpolation techniques for demographic count estimation in incompatible zone systems. Environment and Planning A. 2005;37:127–139. [Google Scholar]
Simpson L. Geography conversion tables: a framework for conversion of data between geographical units. International Journal of Population Geography. 2002;8:69–82. [Google Scholar]
Sleeter R, Gould M. Techniques and Methods 11-C2. U.S. Department of the Interior; Reston, Virginia: 2007. Geographic information system software to remodel population data using dasymetric mapping methods. [Google Scholar]
Tapp AF. Areal interpolation and dasymetric mapping methods using local ancillary data sources. Cartography and Geographic Information Science. 2010;37:215–228. [Google Scholar]
Tatian PA. NCDB. 1970-2000 Tract Data: Data Users Guide. Urban Institute; Washington, DC: 2003. Neighborhood Change Database. [Google Scholar]
U.S. Census Bureau. G. D. 2010 Census tract relationship file overview. U.S. Census Bureau; 2011. [Google Scholar]
Wright JK. A Method of Mapping Densities of Population: With Cape Cod as an example. Geographical Review. 1936;26:103–110. [Google Scholar]
Xie Y. The overlaid network algorithms for areal interpolation problem. Computers, Environment and Urban Systems. 1995;19:287–306. [Google Scholar]
Zandbergen PA, Ignizio DA. Comparison of dasymetric mapping techniques for small-area population estimates. Cartography and Geographic Information Science. 2010;37:199–214. [Google Scholar]
Zhang C, Qiu F. A point-based intelligent approach to areal interpolation. The Professional Geographer. 2011;63:262–276. [Google Scholar]

[R1] Bracken I, Martin D. The generation of spatial population distributions from census centroid data. Environment and Planning A. 1989;21:537–543. doi: 10.1068/a210537. [DOI] [PubMed] [Google Scholar]

[R2] Bracken I, Martin D. Linkage of the 1981 and 1991 UK Censuses using surface modelling concepts. Environment and Planning A. 1995;27:379–390. doi: 10.1068/a270379. [DOI] [PubMed] [Google Scholar]

[R3] Eicher CL, Brewer CA. Dasymetric mapping and areal interpolation: Implementation and evaluation. Cartography and Geographic Information Science. 2001;28:125–138. [Google Scholar]

[R4] Goodchild MF, Anselin L, Deichmann U. A framework for the areal interpolation of socioeconomic data. Environment and Planning A. 1993;25:383–397. [Google Scholar]

[R5] Goodchild MF, Lam N. Areal interpolation: A variant of the traditional spatial problem. Geo-Processing. 1980;1:297–312. [Google Scholar]

[R6] Gregory IN. The accuracy of areal interpolation techniques: standardising 19th and 20th century census data to allow long-term comparisons. Computers, Environment and Urban Systems. 2002;26:293–314. [Google Scholar]

[R7] Gregory IN, Ell PS. Breaking the boundaries: Geographical approaches to integrating 200 years of the census. Journal of the Royal Statistical Society. Series A. Statistics in Society. 2005;168:419–437. [Google Scholar]

[R8] Kyriakidis PC. A geostatistical framework for area-to-point spatial interpolation. Geographical Analysis. 2004;36:259–289. [Google Scholar]

[R9] Kyriakidis PC, Yoo E-H. Geostatistical prediction and simulation of point values from areal data. Geographical Analysis. 2005;37:124–151. [Google Scholar]

[R10] Maantay JA, Maroko AR, Herrmann C. Mapping population distribution in the urban environment: The cadastral-based expert dasymetric system. CEDS. Cartography and Geographic Information Science. 2007;34:77–102. [Google Scholar]

[R11] Martin D. Mapping population data from zone centroid locations. Transactions of the Institute of British Geographers. 1989;14:90–97. [PubMed] [Google Scholar]

[R12] Martin D, Dorling D, Mitchell R. Linking censuses through time: Problems and solutions. Area. 2002;34:82–91. [Google Scholar]

[R13] Mennis J. Generating surface models of population using dasymetric mapping. The Professional Geographer. 2003;55:31–42. [Google Scholar]

[R14] Reibel M, Agrawal A. Areal interpolation of population counts using pre-classified land cover data. Population Research and Policy Review. 2007;26:619–633. [Google Scholar]

[R15] Reibel M, Bufalino ME. Steet-weighted interpolation techniques for demographic count estimation in incompatible zone systems. Environment and Planning A. 2005;37:127–139. [Google Scholar]

[R16] Simpson L. Geography conversion tables: a framework for conversion of data between geographical units. International Journal of Population Geography. 2002;8:69–82. [Google Scholar]

[R17] Sleeter R, Gould M. Techniques and Methods 11-C2. U.S. Department of the Interior; Reston, Virginia: 2007. Geographic information system software to remodel population data using dasymetric mapping methods. [Google Scholar]

[R18] Tapp AF. Areal interpolation and dasymetric mapping methods using local ancillary data sources. Cartography and Geographic Information Science. 2010;37:215–228. [Google Scholar]

[R19] Tatian PA. NCDB. 1970-2000 Tract Data: Data Users Guide. Urban Institute; Washington, DC: 2003. Neighborhood Change Database. [Google Scholar]

[R20] U.S. Census Bureau. G. D. 2010 Census tract relationship file overview. U.S. Census Bureau; 2011. [Google Scholar]

[R21] Wright JK. A Method of Mapping Densities of Population: With Cape Cod as an example. Geographical Review. 1936;26:103–110. [Google Scholar]

[R22] Xie Y. The overlaid network algorithms for areal interpolation problem. Computers, Environment and Urban Systems. 1995;19:287–306. [Google Scholar]

[R23] Zandbergen PA, Ignizio DA. Comparison of dasymetric mapping techniques for small-area population estimates. Cartography and Geographic Information Science. 2010;37:199–214. [Google Scholar]

[R24] Zhang C, Qiu F. A point-based intelligent approach to areal interpolation. The Professional Geographer. 2011;63:262–276. [Google Scholar]

PERMALINK

Interpolating U.S. Decennial Census Tract Data from as Early as 1970 to 2010: A Longtitudinal Tract Database

John R Logan

Zengwang Xu

Brian Stults

Roles

Abstract

Dealing with boundary changes using interpolation methods

Applications of interpolation to boundary changes in U.S. census tracts

Figure 1.

Table 1.

Figure 2.

Combining areal and population interpolation

Interpolation with area weights

Figure 3.

Assessing alternative approaches

Table 2.

Dissemination: The LTDB

Summary and discussion

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Interpolating U.S. Decennial Census Tract Data from as Early as 1970 to 2010: A Longtitudinal Tract Database

John R Logan

Zengwang Xu

Brian Stults

Roles

Abstract

Dealing with boundary changes using interpolation methods

Applications of interpolation to boundary changes in U.S. census tracts

Figure 1.

Table 1.

Figure 2.

Combining areal and population interpolation

Interpolation with area weights

Figure 3.

Assessing alternative approaches

Table 2.

Dissemination: The LTDB

Summary and discussion

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases