Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jan 1.
Published in final edited form as: GIsci Remote Sens. 2018 Sep 3;56(3):430–461. doi: 10.1080/15481603.2018.1509463

Data-enriched Interpolation for Temporally Consistent Population Compositions

Hamidreza Zoraghein 1,*, Stefan Leyk 1
PMCID: PMC6936759  NIHMSID: NIHMS1515001  PMID: 31889937

Abstract

This research evaluates the performance of areal interpolation coupled with dasymetric refinement to estimate different demographic attributes, namely population sub-groups based on race, age structure and urban residence, within consistent census tract boundaries from 1990 to 2010 in Massachusetts. The creation of such consistent estimates facilitates the study of the nuanced micro-scale evolution of different aspects of population, which is impossible using temporally incompatible small-area census geographies from different points in time. Various unexplored ancillary variables, including the Global Human Settlement Layer (GHSL), the National Land-Cover Database (NLCD), parcels, building footprints and the proprietary ZTRAX® dataset are utilized for dasymetric refinement prior to areal interpolation to examine their effectiveness in improving the accuracy of multi-temporal population estimates. Different areal interpolation methods including Areal Weighting (AW), Target Density Weighting (TDW), Expectation Maximization (EM) and its data-extended approach are coupled with different dasymetric refinement scenarios based on these ancillary variables. The resulting consistent small area estimates of white and black subpopulations, people of age 18–65 and urban population show that dasymetrically refined areal interpolation is particularly effective when the analysis spans a longer time period (1990–2010 instead of 2000–2010) and the enumerated population is sufficiently large (e.g., counts of white vs. black). The results also demonstrate that current census-defined urban areas overestimate the spatial distribution of urban population and dasymetrically refined areal interpolation improves estimates of urban population. Refined TDW using building footprints or the ZTRAX® dataset outperforms all other methods. The implementation of areal interpolation enriched by dasymetric refinement represents a promising strategy to create more reliable multi-temporal and consistent estimates of different population subgroups and thus demographic compositions. This methodological foundation has the potential to advance micro-scale modeling of various subpopulations, particularly urban population to inform studies of urbanization and population change over time as well as future population projections.

Keywords: Spatial Analysis, Population Estimation, Dasymetric Modeling, Urban, Census Data

1. Introduction

A persistent difficulty in the analysis of population change is the inconsistency of enumeration units from different census years. Due to such inconsistencies, units cannot be compared to each other without further processing, a typical problem observed in any enumerated data including crime, health, demographic or economic data. Recent work (e.g. Buttenfield, Ruther, and Leyk 2015; Ruther, Leyk, and Buttenfield 2015; Zoraghein et al. 2016; Schroeder 2007, 2017; Schroeder and Van Riper 2013; Logan, Stults, and Xu 2016) has made significant progress on using enhanced areal interpolation methods, leveraging dasymetric modeling (Wright 1936; Eicher and Brewer 2001; Mennis 2003) in the temporal interpolation of population counts as one solution to the problem. The ultimate objective of those works is to create temporally consistent total population estimates across different census years with minimum error. Typically, boundaries of one census year are used as “target zones”, and population counts, recorded for census boundaries in other years, i.e., “source zones”, are then transferred or redistributed to those target zones.

While the above works successfully created temporally consistent estimates with acceptable errors, two repeatedly reported limitations still exist. First, there is an emerging need to investigate promising new data sources that can be incorporated as ancillary variables. The results from such interpolations can vary according to underlying conditions (Zandbergen and Ignizio 2010) and ancillary variables employed (Langford 2013). Second, the above-described efforts are mainly limited to total population counts. However, to make these approaches useful for the research community in demography, economics and the health sciences, their performance needs to be evaluated on other variables. For example, demographers need to analyze population subgroups, including age or income classes to characterize how different compositions have changed in a complex urban environment.

As a response to these limitations, the objectives of this paper are twofold. First, we systematically examine different ancillary variables for dasymetric (spatial) refinement to enhance regular areal interpolation methods, including Areal Weighting (AW) (Goodchild and Lam 1980), Target Density Weighting (TDW) (Schroeder 2007) and Expectation Maximization (EM) (Dempster, Laird, and Rubin 1977; Flowerdew and Green 1994) to estimate tract-level demographic variables for the whole state of Massachusetts in 1990 and 2000 (source zones) within and compatible to target tract boundaries from the 2010 Census. These models will create fine resolution temporally consistent time-series of population estimates across the three census years. Notably, the framework can also be applied in reverse order, meaning that tract-level population in 2010 be estimated within tract boundaries from the Censuses 1990 and 2000. We opt for the more challenging forward direction because of its importance for research on future population projections and growth. We use different ancillary variables comprising both freely available datasets such as the developed classes (namely, 21, 22, 23 and 24) of the National Land-Cover Database (NLCD) and the binary built-up presence layer from the Global Human Settlement Layer (GHSL) as well as those that are either proprietary or not readily available such as tax parcels of the state, its building footprints, and Zillow’s ZTRAX® records (Zillow 2017), for the first time, which contain approximate address locations of encompassing parcels with a wide range of property- and housing-related attributes such as the built year and housing type. Our analytical comparison evaluates how these different ancillary variables affect the accuracy of areal interpolation methods in a multi-temporal context.

Second, the paper extends the establishment of consistent time-series of population estimates to other census-based population sub-groups related to age, race and urban residence. This extension will expose more nuanced past changes in the population composition, which is important to tackle emerging research questions about neighborhood changes, environmental injustice and urbanization in the past decades and will create a more robust foundation for future population projections. In particular, modeling temporally consistent, small-area urban population estimates, which is additionally aggravated by changing underlying definitions of urban, would facilitate identifying urban land and population for a better understanding of historical demographic key processes such as urbanization, more reliably.

Thus, this paper describes such an areal interpolation framework that integrates a comprehensive set of different ancillary variables with distinct characteristics for dasymetric refinement to improve temporally consistent small area estimates of various subpopulations at the tract level. Potential and limitations of this framework will be critically discussed.

2. Background: Dasymetric Refinement for Improved Areal Interpolation

Researchers have used various ancillary variables for downscaling population data in dasymetric modeling efforts. Land-cover/land-use is still the most widely used ancillary variable (Wright 1936; Mennis 2003, 2009; Reibel and Agrawal 2007; Linard, Gilbert, and Tatem 2011; Buttenfield, Ruther, and Leyk 2015; Ruther, Leyk, and Buttenfield 2015; Dmowska and Stepinski 2017). High resolution satellite images constitute another type of ancillary data source to disaggregate population estimates (Lu et al. 2010; Ural, Hussain, and Shan 2011; Lung et al. 2013). Other ancillary variables used in the literature include LiDAR data (Dong, Ramesh, and Nepali 2010; Qiu, Sridharan, and Chun 2010; Sridharan and Qiu 2013; Xie, Weng, and Weng 2015), tax parcel data (Maantay, Maroko, and Herrmann 2007; Kar and Hodgson 2012; Mitsova, Esnard, and Li 2012; Jia, Qiu, and Gaughan 2014; Jia and Gaughan 2016; Zoraghein et al. 2016; Zoraghein and Leyk 2018a), street networks (Reibel and Bufalino 2005; Su et al. 2010), impervious surfaces (Zandbergen and Ignizio 2010; Schroeder 2017), address points (Tapp 2010; Zandbergen 2011), buildings (Wu, Wang, and Qiu 2008; Calka, Bielecka, and Zdunkiewicz 2016; Zoraghein and Leyk 2018a) and Volunteered Geographic Information (VGI) (Bakillah et al. 2014; Geiß et al. 2016).

AW, TDW, and EM are common areal interpolation methods used for creating consistent multi-temporal enumerated population estimates. These methods calculate weights for atoms, i.e. intersections of source and target zones, to determine their shares of the population of encompassing source zones and then aggregate their population estimates to target zones. AW allocates population weights proportional to overlapping areas between atoms and source zones. TDW uses ratios of population densities of atoms to source zones over time to determine weights of atoms. EM, on the other hand, calculates weights according to the dominant development intensity in atoms achieved from additional ancillary data. For example, if a proxy for different built-up intensity levels exists for the study area (e.g. housing characteristics of parcels or land-use types), EM allocates higher population values to atoms with dominant concentrations of high built-up intensity than those with low intensity levels. To do so, it identifies areas with similar built-up intensity levels as distinct control zones and assigns an individual population density weight to each zone in an iterative process. Control zones with more intensive development are assigned higher weights.

Enhancement using dasymetric refinement entails that the areal interpolation methods be applied to only inhabitable “sub-areas” of source and target zones, which are formed by overlaying ancillary variables on them, and thus existing assumptions have to be revised. In addition to AW, TDW, with and without dasymetric refinement, and EM, this paper utilizes “the extended refinement approach for EM”, described in Zoraghein and Leyk (2018). The data extension improves EM by dividing its initial control zones (e.g. buildings or parcels with the same housing type) into more homogeneous sub-control zones according to area and unit-density quantiles.

Figure 1 summarizes the characteristics of refined AW, refined TDW, EM and the extended refinement approach for EM. These methods can be applied to any enumerated demographic variable such as total population or population divided by age and race categories. Moreover, the Appendix outlines the assumptions and mathematical foundations of all these methods as described in detail in recent research (Ruther, Leyk, and Buttenfield 2015; Zoraghein et al. 2016; Schroeder and Van Riper 2013; Zoraghein and Leyk 2018a).

Figure 1.

Figure 1.

Major characteristics of refined AW, refined TDW, EM and the extended refinement approach for EM.

3. Study Area and Data

3.1. Study Area

This paper conducts its analyses across Massachusetts for several reasons. First, different state-wide datasets applicable as ancillary variables are publicly available, including parcels and building footprints. Second, the proprietary ZTRAX® database (Zillow 2017) is available for this research, and unlike in some other states, its completeness is very high in Massachusetts. Third, although Massachusetts is a small state (27363 square kilometers, the 44th state), its population size is relatively high (6.9 million, the 15th state), with highly variable population densities ranging from the densely populated Boston area (around 5344 per square kilometer) in the east to sparsely populated south-eastern parts (around 13 per square kilometer).

3.2. Data

3.2.1. Census Data

This study focuses on census tracts in 1990, 2000 and 2010, along with their enumerated population subgroups based on race (white/black), age (under 18, above 18 and under 65, above 65) and urban residence (count of urban people). The tabular summary files and geometric boundaries in 1990 were extracted from the National Historical Geographic Information System (NHGIS) data portal (Manson et al. 2017). Census tract boundaries in 2000 and 2010 were accessed as TIGER/Line® (U.S. Census Bureau 2017b) and their demographic attributes were retrieved from the American FactFinder download center (U.S. Census Bureau 2017a). Census block boundaries in 1990 and 2000, as well as their demographic attributes were extracted from the same sources as their encompassing tracts and used to validate interpolated tract-level demographic estimates for 1990 and 2000, respectively.

3.2.2. Publically Available National or Global Ancillary Variables

The NLCD land-cover product, which is derived from the Landsat imagery and published at a resolution of 30m, provides nationally complete, current, consistent, and publicly available information on the nation’s land-cover. This study employs the NLCD layers of the three temporally closest vintages to the census years of interest. They are NLCD 1992 (Vogelmann et al. 2001), NLCD 2001 (Homer et al. 2007) and NLCD 2011 (Homer et al. 2015). NLCD includes multiple classes, representing different land-cover and land-use types. The classes 21, 22 and 23 in NLCD 1992 and 21, 22, 23 and 24 in 2001 and 2011 label developed land with varying intensity levels. Dasymetric refinement leverages development masks based on different combinations of the developed classes in different years to determine the optimum combination for each period.

The GHSL dataset represents global spatial information about the human presence over time. This study uses the Landsat-based fine resolution (38m) version of GHSL, which classifies built-up land from before 1975 to 2014 (Pesaresi et al. 2016). GHSL possesses high potential as a convenient ancillary variable because it maps built-up areas that serve as a proxy for global human presence at a relatively fine resolution, allowing the framework to be implemented in data-poor regions in the future. Therefore, GHSL built-up layers that are approximately coincident with the three census years were used (GHSL epochs of 1990, 2000 and 2014). Some drawbacks in using GHSL are the temporal mismatch between the latest GHSL epoch and the latest census year, the low levels of classification accuracy in rural settings (Leyk et al. 2018), as well as the assumption that developed areas cannot become undeveloped. Regardless of these caveats, the dataset provides a unique and newly available global depiction of human settlement as an appropriate input for dasymetric refinement.

3.2.3. Local Cadastral and Housing Ancillary Variables

Tax parcels of Massachusetts except those in Boston are available per township (MassGIS 2017b). Therefore, parcels of all townships were downloaded and merged together, then combined with the City of Boston’s parcel data created by its Assessing department (BostonGIS 2017) to form a complete polygonal statewide dataset. Land-use classes in parcel records typically indicate the existence and type of developed areas at the lot level, thereby making them a promising ancillary variable (Zoraghein et al. 2016). However, their areal extents range widely, stretching from very small lots in highly urbanized areas to extremely large units in rural locations (Leyk et al. 2014).

Building footprints represent the smallest achievable unit of development for dasymetric refinement. While their availability is still disperse, a statewide spatial layer of building footprints is publicly available for Massachusetts and was utilized for dasymetric refinement (MassGIS 2017a).

This research also employs the nation-wide proprietary ZTRAX® dataset for the first time as another ancillary variable for dasymetric refinement, courtesy of the Zillow Company (Zillow 2017). This point-based dataset contains a multitude of housing and property-related information, as well as coordinates of interpolated approximate address locations of encompassing parcels.

To have a consistent set of attributes for parcels, buildings and ZTRAX® housing records, the standardized set of attributes of ZTRAX® records were assigned to encompassing parcels and buildings using a spatial join operation. This approach offers several advantages. First, it eliminates the inconsistency between attributes of parcels in Boston and those in other areas of the state. Second, it creates a high level of consistency in land-use attributes between the three datasets. Third, it increases the flexibility of using parcels and buildings in other areas, as long as only their geometric footprints exist.

The built-year attribute indicating when the main structure within a parcel has been built and the land-use class attribute defining the category of the building (e.g. single-family residence, apartment, condominium, etc.) were extracted from the ZTRAX® data. The first attribute was used to temporally match the three ancillary variables with the different census years while the second attribute determined the relevant records for creating ancillary masks indicative of human settlement.

4. Methods

The proposed framework utilizes AW, TDW, with and without dasymetric refinement, EM and its data-extended approach. The extended refinement approach for EM requires computationally complex simulations using different numbers of control zones and subcategories of residential units. Given the size of the study area and the required processing time, we chose not to simulate and instead used the configuration (i.e. candidate control zones and their number of subcategories) that were likely to be effective in improving the accuracy of EM. Thus, based on prior report on testing this method (Zoraghein and Leyk 2018a), the five most frequent control zones and the control zone “condominium” were each further sub-divided into seven (presumably more homogeneous) sub-zones using quantiles of area and unit density, respectively. Specifically, the sub-zones include seven groups of control zones “single-family residence”, “two-family residence”, “three-family residence”, “condominium”, “mixed”, and “residential multiple houses.” These sub-zones and all other control zones that were not sub-divided, constituted the extended set of control zones.

The methods tested in this areal interpolation framework employ different ancillary variables that are presumably related to population distribution, and we compare their effectiveness in generating accurate demographic estimates within temporally consistent census tracts. Figure 2 outlines those ancillary variables and the major steps taken to preprocess them.

Figure 2.

Figure 2.

Utilized ancillary variables and their preprocessing steps.

4.1. Dasymetric Refinement of Census Enumeration Units

This paper employs dasymetric refinement based on the above-mentioned ancillary variables and their different combinations. For example, it does not confine the selection of the NLCD developed classes to those suggested in Ruther, Leyk, and Buttenfield (2015), but instead explores their different combinations at different points in time for creating spatial masks of populated areas. Moreover, it combines the NLCD developed classes with built-up land depictions derived from GHSL to create composite spatial masks.

ZTRAX® records in Massachusetts are already categorized into 245 land-cover/land-use types, of which 41 classes that indicate inhabited or populated land are extracted. Point locations of all these selected records are spatially joined with encompassing parcel boundaries and buildings. Spatial masks are also created solely based on selected ZTRAX® point locations. To do so, these point features are rasterized using a target resolution of 30m to make the ancillary dataset comparable to NLCD.

The different dasymetrically refined areal interpolation methods described in the Appendix leverage these various ancillary variables to derive temporally consistent estimates of population sub-groups based on race and age in 1990 and 2000 within 2010 census tract boundaries, respectively. The fundamental principle underlying these methods is the refinement of census units to those portions that are likely populated in each point in time, to then run areal interpolation techniques with revised areas and population density estimates as input to create consistent enumerated estimates.

4.2. Dasymetric Refinement for Creating Consistent Estimates of Urban Land and Urban Population

The definition of urban lands and urban population represents one of the most persistent challenges in demography and urban geography and typically underlies complex processes. The U.S. Census Bureau uses various criteria based on population, population density, identification of designated places, land-use and road segments, among others, to identify urban lands, which are delineated by layers of Urbanized Areas (UAs) and Urban Clusters (UCs) (Department of Commerce 2011; U.S. Census Bureau 2011). One well-known problem in using these layers is the change in the underlying definition of what and who is urban (Zoraghein and Leyk 2018b). Thus, in addition to the temporal incompatibility in small-area enumeration units, the concepts of urban lands and consequently urban population change over time. This makes studying the evolution of these two complex phenomena extremely challenging. Nevertheless, given the ubiquitous and growing trends of urbanization and the limited knowledge about such processes, research efforts to model urban lands and population reliably and consistently over time and across regions are essential and have important implications in domains such as urban and regional planning, policy making and resource allocation.

Census-defined urban areas of Massachusetts in 1990, 2000, and 2010 were available and accessed from the NHGIS portal. Figure 3 shows these areas similar to Zoraghein and Leyk (2018b) with an improvement upon their depiction in 1990 by aggregating urban census blocks. The goal of this urban analysis part is two-fold. First, by using official urban land layers for dasymetric refinement, it assesses how reliably they represent areas inhabited by the designated urban population. Second, it investigates the performance of other global ancillary variables, as possible surrogates for census defined urban areas, which would enable similar analyses in data-poor regions.

Figure 3.

Figure 3.

Census-defined urban areas in Massachusetts in 1990, 2000 and 2010.

The first part initially treats census defined urban areas as another ancillary variable for dasymetric refinement to estimate temporally consistent urban population values from 1990 to 2010 and 2000 to 2010 at the census tract level, respectively. Then, it further refines urban area layers in each year by the aforementioned ancillary variables, and for each composite refinement, it evaluates the performances of the different areal interpolation methods. Notably, a larger number of land-use classes from the ZTRAX® data should be selected to cover land-use types that are not residential but indicate other types of urban lands (e.g., commercial or industrial). Therefore, we explored all land-use types manually and extracted those that could characterize urban features. We also cross referenced with the census-defined urban areas to ensure that no rural land-use type was selected.

The second part employs all the ancillary variables for dasymetric refinement but does not limit them to be within census-defined urban areas. This step evaluates how much urban area could be detected outside designated urban lands and provides an idea of the implication for estimating urban population without official urban area delineations. The urban analysis concludes by presenting the outcomes of these models and assessing the reliability level of each combination of the ancillary variables to mimic urban areas and thus estimating urban population.

4.3. Validation

This study validates the estimated tract-level results for each census year using census block statistics, as is often done in the tract-level analysis and dasymetric modeling (Buttenfield, Ruther, and Leyk 2015; Ruther, Leyk, and Buttenfield 2015; Schroeder 2007). After transferring population estimates from source zones to target zones, each 2010 census tract can be linked with its estimated population counts in 1990 and 2000. These estimates for target zones in 1990 and 2000 are compared to population counts of census blocks in 1990 and 2000 aggregated to target zone boundaries. Notably, the determination of urban population for blocks and tracts is not temporally consistent due to changing definitions. Thus, while accuracy comparisons are compatible across all methods in one time period, their comparisons cannot be established across two periods reliably. Furthermore, blocks that are designated urban might also contain non-urban population which cannot be accounted for nor evaluated. However, in this study we assume that the population living in an urban block is entirely urban.

Different error measures are calculated such as the mean absolute error (MAE), median absolute error, root mean square error (RMSE) and 90% percentile of absolute errors. These error measures and their distributions are compared across methods to characterize and evaluate the performance of the described methods. For example, the MAE and RMSE measures illustrate the overall behavior of estimation errors and are sensitive to outliers whereas the median absolute error and 90% percentile of absolute errors describe the upper end of the error distribution and placement of extreme absolute error values (Zoraghein and Leyk 2018a; Zoraghein et al. 2016).

Interpretations in this paper are mainly based on absolute error measures described above to follow the previous research (Buttenfield, Ruther, and Leyk 2015; Ruther, Leyk, and Buttenfield 2015; Zoraghein et al. 2016). However, mean percent error and mean absolute percent error measures are also summarized in Tables 15 to provide a more complete picture of the error behavior when divided by a baseline population (block-aggregated population from the source year) and converted to percentages. These two measures offer overall insights into tract-level under- or over-estimation and proportional errors independent of baseline population values.

Table 1.

Selected error measures pertaining to white population estimates.

Mean Abs Percent Error Mean Percent Error MAE Median Abs Error RMSE 90th Percentile Abs Error Ancillary Variable
Method Time Period: 1990–2010
AW 32.33 22.49 343 55 772 1184 None
TDW 11.14 5.62 146 63 292 359 None
RefTDW 9.85 4.67 127 56 230 327 NLCD
RefTDW 10.69 5.85 120 58 214 301 GHSL
RefTDW 10.06 4.92 128 66 223 321 NLCD-GHSL
RefTDW 10.81 5.85 117 50 233 284 Parcels
RefTDW 8.36 4.05 87 42 178 204 Buildings
RefTDW 8.62 4.18 91 37 216 214 ZTRAX8
EM 22.10 15.81 203 54 417 649 NLCD
EM 39.11 21.54 304 65 667 990 Parcels
EM 7.84 3.01 127 41 271 330 Buildings
EM 5.87 0.81 130 39 308 344 ZTRAX
Extended EM 8.37 3.24 132 43 303 332 Parcels
Extended EM 7.76 3.12 120 43 256 308 Buildings
Time Period: 2000–2010
AW 33.79 23.90 313 53 1281 837 None
TDW 4.35 0.79 77 44 131 180 None
RefTDW 4.23 0.89 71 38 126 175 NLCD
RefTDW 4.5 0.96 78 41 142 194 GHSL
RefTDW 4.38 0.91 74 38 136 178 NLCD-GHSL
RefTDW 4.38 0.89 75 39 157 172 Parcels
RefTDW 4.33 0.77 68 36 138 156 Buildings
RefTDW 4.07 0.68 66 36 133 152 ZTRAX
EM 23.10 18.28 143 42 346 345 NLCD
EM 17.78 11.91 182 41 479 464 Parcels
EM 6.6 2.88 95 37 209 232 Buildings
EM 4.15 0.42 94 37 211 243 ZTRAX
Extended EM 5.8 2.17 101 39 246 224 Parcels
Extended EM 5.4 1.87 90 36 191 212 Buildings

Table 5.

Selected error measures pertaining to urban population estimates in 2000 within 2010 tract boundaries.


Mean Abs Percent Error Mean Percent Error MAE Median Abs Error RMSE 90th Percentile Abs Error Ancillary Variable
Limited to Urban Areas
AW 144.4 135.37 322 11 1393 1006 None
TDW 3.59 0.33 60 10 152 164 None
RefTDW 3.51 0.37 76 8 237 170 Urban Areas
RefTDW 2.71 0.11 49 4 147 126 NLCD
RefTDW 3.08 0.2 59 6 184 148 GHSL
RefTDW 2.99 0.08 57 5 171 148 NLCD-GHSL
RefTDW 15.81 10.58 438 6 3438 610 Parcels
RefTDW 2.60 −0.03 46 4 138 119 Buildings
RefTDW 2.55 0.13 40 4 122 98 ZTRAX12
EM 66.8 63.2 128 5 428 322 NLCD
EM 60.89 56.89 138 5 446 402 NLCD-GHSL
EM 114.37 110.32 142 4 546 317 Parcels
EM 6.6 4.14 77 2 281 194 Buildings
EM 21.46 18.46 84 3 302 202 ZTRAX
Extended EM 21.04 18.4 88 3 349 189 Parcels
Extended EM 6.18 3.79 74 3 269 184 Buildings
Not Limited to Urban Areas
RefTDW 3.36 0.46 51 6 145 143 NLCD
RefTDW 3.57 0.52 57 7 164 166 GHSL
RefTDW 3.46 0.44 55 6 157 159 NLCD-GHSL
RefTDW 16.69 11.35 457 8 3576 636 Parcels
RefTDW 3.23 0.34 48 4 138 137 Buildings
RefTDW 3.11 0.35 45 4 134 122 ZTRAX
EM 64.57 60.49 139 6 442 390 NLCD
EM 60.07 55.86 142 6 450 436 NLCD-GHSL
EM 104.61 99.16 192 5 633 514 Parcels
EM 9.9 6.62 101 4 340 244 Buildings
EM 19.87 16.24 99 3 343 284 ZTRAX
Extended EM 19.11 16.07 95 5 353 255 Parcels
Extended EM 9.04 5.94 93 4 310 270 Buildings

5. Results

This section presents some selected results to reflect the most relevant outcomes among a large number of model runs for different demographic attributes. To reduce complexity, this section excludes different implementations of refined AW as it is typically found to be the least effective approach (Schroeder 2007; Schroeder and Van Riper 2013; Ruther, Leyk, and Buttenfield 2015; Zoraghein et al. 2016). This paper focuses on the estimation of different population sub-groups. Nonetheless, Section 2 of the Appendix also includes the maps and results pertaining to the estimation of total population.

5.1. Dasymetric Refinement of Population Sub-groups Based on Race and Age

Figures 4 and 6 show maps of absolute errors of population estimates based on TDW, which is the most accurate unrefined areal interpolation method, and the best-performing refined method in 1990 and 2000. They illustrate the spatial distribution of target tract level absolute errors associated with estimates of white population and population aged 18–65, respectively, as two examples of race- and age-related attributes. The maps in Figures 5 and 7, on the other hand, compare estimates of the two demographic attributes in 1990 and 2000 to block-aggregated counts, as references.

Figure 4.

Figure 4.

Absolute error maps of white population at the target zone level: (a) in 1990 based on TDW, (b) in 1990 based on TDW refined by buildings, (c) in 2000 based on TDW, and (d) in 2000 based on TDW refined by ZTRAX®1.

Figure 6.

Figure 6.

Absolute error maps of population aged 18–65 at the target zone level: (a) in 1990 based on TDW, (b) in 1990 based on TDW refined by buildings, (c) in 2000 based on TDW and (d) in 2000 based on TDW refined by ZTRAX®.

Figure 5.

Figure 5.

Maps of white population at the target zone level: (a) in 1990 based on TDW, (b) in 1990 based on TDW refined by buildings, (c) in 1990 based on block aggregation, (d) in 2000 based on TDW, (e) in 2000 based on TDW refined by ZTRAX®, and (f) in 2000 based on block aggregation.

Figure 7.

Figure 7.

Maps of population aged 18–65 at the target zone level: (a) in 1990 based on TDW, (b) in 1990 based on TDW refined by buildings, (c) in 1990 based on block aggregation, (d) in 2000 based on TDW, (e) in 2000 based on TDW refined by ZTRAX® and (f) in 2000 based on block aggregation.

The maps in Figures 8 and 9 show the target tract level spatial distribution of normalized absolute errors pertaining to the above-mentioned demographic attributes for the 1990–2010 and 2000–2010 time periods, respectively. For normalization, absolute errors are divided by block-aggregated values of the respective year. Normalized errors are generally between 0 and 1, but can exceed 1 if the absolute error of a target tract is higher than its reference value, which can occasionally be observed in sparsely-populated tracts. Normalized error distributions provide an objective comparison between target tracts across study areas and between different time periods.

Figure 8.

Figure 8.

Normalized absolute error maps in 1990 for: (a) population aged 18–65 using TDW, (b) population age 18–65 using TDW refined by buildings, (c) white population using TDW, and (d) white population using TDW refined by buildings.

Figure 9.

Figure 9.

Normalized absolute error maps in 2000 for: (a) population aged 18–65 using TDW, (b) population aged 18–65 using TDW refined by ZTRAX®2, (c) white population using TDW, and (d) white population using TDW refined by ZTRAX®.

Tables 13 provide an extensive summary of the performance of different areal interpolation methods. These tables include six error metrics associated with estimates of population by race (white and black) and age (18–65), respectively, and thus allow a numerical comparison between different methods according to the two time periods and estimated demographic attributes.

Table 3.

Selected error measures pertaining to estimates of population aged 18–65.

Mean Abs Percent Error Mean Percent Error MAE Median Abs Error RMSE 90th Percentile Abs Error Ancillary Variable
Time Period: 1990–2010
AW 28.38 18.43 248 42 578 797 None
TDW 12.12 6.35 110 49 210 267 None
RefTDW 10.63 5.26 99 45 182 248 NLCD
RefTDW 11.93 6.71 97 46 180 246 GHSL
RefTDW 11.02 5.6 101 50 184 252 NLCD-GHSL
RefTDW 9.4 4.23 94 39 196 224 Parcels
RefTDW 5.92 1.27 76 33 164 180 Buildings
RefTDW 8.72 4.03 78 31 198 182 ZTRAX10
EM 19.63 13.35 150 41 319 456 NLCD
EM 17.7 8.17 231 48 536 712 Parcels
EM 7.8 2.47 112 33 279 281 Buildings
EM 6.18 0.68 116 30 313 295 ZTRAX
Extended EM 8.72 3.3 116 34 313 287 Parcels
Extended EM 8.14 2.89 110 33 275 292 Buildings
Time Period: 2000–2010
AW 28.6 18.63 245 44 934 735 None
TDW 3.78 0.14 63 33 114 151 None
RefTDW 3.68 0.22 60 31 114 140 NLCD
RefTDW 3.94 0.28 65 33 127 154 GHSL
RefTDW 3.83 0.23 62 32 123 144 NLCD-GHSL
RefTDW 3.77 0.19 62 32 134 136 Parcels
RefTDW 3.59 0.1 59 30 125 130 Buildings
RefTDW 3.54 0.05 57 29 120 128 ZTRAX
EM 18.64 13.75 121 34 304 294 NLCD
EM 15.6 9.53 151 35 434 373 Parcels
EM 7.34 3.13 90 33 236 215 Buildings
EM 4.57 0.3 92 32 259 214 ZTRAX
Extended EM 6.23 2.2 97 32 290 222 Parcels
Extended EM 6.91 2.83 87 32 229 215 Buildings

5.2. Dasymetric Refinement of Urban Population Estimates

Urban population estimates are calculated for both 1990 and 2000 within 2010 target units based on two separate assumptions. First, it assumes that no official representation of urban area footprints exists. Second, it uses encompassing census-defined urban areas in 1990 and 2000, respectively, and refines them further using additional ancillary variables. Consequently, maps of absolute errors in estimating urban population (Figure 10) cover three implementations, namely TDW, the best performing method refined by ancillary variables inside and outside census-defined urban areas, and the best performing method using ancillary variables but limited within urban areas. Moreover, Figure 11 shows the relevant maps of consistent urban population estimates.

Figure 10.

Figure 10.

Absolute error maps of urban population at the target zone level: (a) in 1990 based on TDW, (b) in 1990 based on TDW refined by ZTRAX®3 (not limited to urban areas), (c) in 1990 based on TDW refined by ZTRAX® (limited to urban areas) as well as (d) in 2000 based on TDW, (e) in 2000 based on TDW refined by ZTRAX® (not limited to urban areas), (f) in 2000 based on TDW refined by ZTRAX® (limited to urban areas).

Figure 11.

Figure 11.

Maps of urban population at the target zone level: (a) in 1990 based on TDW, (b) in 1990 based on TDW refined by ZTRAX® (not limited to urban areas), (c) in 1990 based on TDW refined by ZTRAX® (limited to urban areas), (d) in 1990 based on block aggregation, as well as, (e) in 2000 based on TDW, (f) in 2000 based on TDW refined by ZTRAX® (not limited to urban areas), (g) in 2000 based on TDW refined by ZTRAX® (limited to urban areas), (h) in 2000 based on block aggregation.

Tables 4 and 5 summarize error (or disagreement) metrics of different areal interpolation methods for generating temporally consistent urban population estimates over the two time periods. Each table contains of two parts: the upper part shows error metrics when the additional refinement is employed within census-defined urban areas whereas the lower part displays those when refinement is employed inside and outside (and thus independent of) urban areas. According to Tables 4 and 5, the most effective method is TDW refined by the composite census-defined urban areas and ZTRAX® data, in both 1990 and 2000. Notably, this comparison is not a block-level validation but rather a comparison to evaluate disagreement, due to an unknown count of non-urban people in officially designated urban blocks.

Table 4.

Selected error measures pertaining to urban population estimates in 1990 within 2010 tract boundaries.

Mean Abs Percent Error Mean Percent Error MAE Median Abs Error RMSE 90th Percentile Abs Error Ancillary Variable
Limited to Urban Areas
AW 113.55 102.79 353 58 835 1146 None
TDW 41.53 31.68 166 62 337 437 None
RefTDW 11.68 3.78 187 50 523 444 Urban Areas
RefTDW 8.86 1.81 146 44 365 387 NLCD
RefTDW 12.88 4.97 142 51 370 322 GHSL
RefTDW 10.62 2.31 140 50 347 359 NLCD-GHSL
RefTDW 12.63 1.21 258 59 1402 486 Parcels
RefTDW 8.59 −1.43 107 43 281 257 Buildings
RefTDW 7.12 −1.62 95 38 282 226 ZTRAX11
EM 8.6 3.17 132 31 313 390 NLCD
EM 9.96 4.03 132 33 307 371 NLCD-GHSL
EM 12.2 2.96 199 47 477 585 Parcels
EM 7.22 0.63 127 40 320 317 Buildings
EM 7.23 −1.31 132 37 332 347 ZTRAX
Extended EM 6.74 −1.17 121 41 318 267 Parcels
Extended EM 7.03 0.47 124 41 315 304 Buildings
Not Limited to Urban Areas
RefTDW 36.79 28.38 152 50 298 427 NLCD
RefTDW 40.01 31.67 143 58 276 363 GHSL
RefTDW 36.81 28.17 142 53 278 407 NLCD-GHSL
RefTDW 39.39 28.59 176 61 363 459 Parcels
RefTDW 36.7 26.99 121 47 245 305 Buildings
RefTDW 37.52 28.22 119 42 268 295 ZTRAX
EM 49.29 41.77 188 48 405 599 NLCD
EM 43.57 35.35 169 43 363 513 NLCD-GHSL
EM 73.34 58.92 346 69 793 1184 Parcels
EM 53.86 42.16 196 50 442 593 Buildings
EM 46.66 35.7 195 44 464 626 ZTRAX
Extended EM 39.69 29.1 172 48 412 480 Parcels
Extended EM 51.87 40.65 185 49 419 538 Buildings

6. Discussion

6.1. Multi-temporal Estimates of Population Sub-groups by Race and Age

The results show that dasymetrically refined interpolation methods produce improved estimates (i.e. with lower error measures) of total population (Appendix) as well as other population sub-groups, in most cases, especially when precise ancillary datasets such as building footprints and ZTRAX® are used. Nonetheless, the comprehensive set of results allows a critical reflection on the current refinement effect that merits attention and provides insights for further enhancements in future research.

As discussed in Schroeder (2007), the uncertainty level in estimating population from source zones in one census year within target zones from another year has a positive statistical relationship with initial population counts of source zones and the level of dissimilarity between boundaries of source and target zones. Thus, when interpolating attributes of small sub-groups such as black population (Table 2) or transferring demographic attributes from boundaries in 2000 to those in 2010, which often do not change drastically (the lower parts of Tables 1 to 3), the inherent uncertainty of the process is expected to be low, thus questioning the visible potential gain of dasymetric refinement, irrespective of the ancillary variable employed. Therefore, one would argue that using dasymetric refinement for this temporal analysis is justified for applications involving demographic attributes with high enumerated values and long time periods.

Table 2.

Selected error measures pertaining to black population estimates.

Mean Abs Percent Error Mean Percent Error MAE Median Abs Error RMSE 90th Percentile Abs Error Ancillary Variable
Time Period: 1990–2010
AW 32.32 20.05 17 3 65 33 None
TDW 16.22 3.5 11 3 30 24 None
RefTDW 15.63 3.23 11 3 32 26 NLCD
RefTDW 15.93 3.74 11 3 31 24 GHSL
RefTDW 15.73 3.44 11 3 33 26 NLCD-GHSL
RefTDW 16.38 4.48 11 2 35 27 Parcels
RefTDW 15.55 3.94 11 2 39 24 Buildings
RefTDW 15.6 3.48 11 2 36 26 ZTRAX9
EM 22.25 12.98 14 2 51 27 NLCD
EM 22.21 9.63 17 2 62 34 Parcels
EM 19.71 7.84 15 2 50 29 Buildings
EM 20.24 8.43 16 2 53 31 ZTRAX
Extended EM 19.67 8.15 15 2 52 29 Parcels
Extended EM 20.27 7.69 15 2 51 32 Buildings
Time Period: 2000–2010
AW 46.17 34.7 23 2 94 41 None
TDW 7.43 −0.01 10 2 32 25 None
RefTDW 7.38 0.28 9 2 31 25 NLCD
RefTDW 7.45 0.4 10 2 33 24 GHSL
RefTDW 7.41 0.4 10 2 32 25 NLCD-GHSL
RefTDW 7.49 0.5 9 2 30 24 Parcels
RefTDW 6.95 0.15 9 1 32 24 Buildings
RefTDW 7.12 0.13 10 2 31 25 ZTRAX
EM 20.5 13.41 15 2 50 30 NLCD
EM 12.48 4.1 17 2 73 31 Parcels
EM 11.13 2.77 15 2 56 28 Buildings
EM 11.41 3.18 15 2 56 30 ZTRAX
Extended EM 11.33 3.1 15 2 57 30 Parcels
Extended EM 10.9 2.26 15 2 55 31 Buildings

This study assumes that total population and population sub-groups have similar associations with ancillary variables. This is a limitation by design, as we were interested in understanding how the same framework performs for different demographic attributes. We acknowledge that, ideally, either different ancillary variables would be needed or established associations would have to be modified for certain population sub-groups to better reflect their spatial distribution after dasymetric refinement. For example, residential areas in close proximity to business centers and schools would have to be weighted higher for younger age groups. Alternatively, additional attributes could be tested as ancillary data such as lot areas and the number of bedrooms when estimating population in certain income classes.

Moreover, exogenous factors might influence the estimation accuracy regardless of the effectiveness of dasymetric refinement. For example, if the current framework includes 2020 in future applications, the existence of baby boomers, as a large demographic cohort in some areas, may lead to inflated distributions of the population group aged above 65 in 2020, and thus result in biased temporal estimations of the population group. This is particularly true for TDW, which assumes ratios of population densities remain the same over time. We did not encounter this issue in our analysis because we based our age-specific findings on the age group 18–65 and time periods 1990–2010 and 2000–2010. Nonetheless, it is imperative to consider such implications for future analyses.

The quality of the ancillary dataset also influences the accuracy gain of dasymetric refinement (Langford 2013). For example, the developed classes of NLCD for refinement were initially selected following Ruther, Leyk, and Buttenfield (2015), but resulted in relatively high absolute error measures. Therefore, different combinations of the developed classes among the different releases were tested to find the optimum solution. In this study, the classes 21–23 from the NLCD 1992, and 21–24 from the NLCD versions of 2001 and 2011, respectively, result in the highest accuracy. The main reason for the inclusion of high intensity development may be urban centers such as Boston, in which dense populations can be modeled reliably in highly impervious areas. Landsat-based GHSL has lower levels of classification accuracy in rural settings, a direct consequence of mixed pixel and other problems in classifying remote sensing data, leading to the under-estimation of rural developments (Leyk et al. 2018), and assumes built-up areas to be cumulative. These two factors can affect the dasymetric refinement results using this dataset. In contrast, parcels over-estimate rural developments, manifested by high ratios of parcel areas to extents of building footprints within them (Leyk et al. 2014, 2013; Sahar, Muthukumar, and French 2010), disrupting the overall effectiveness of their dasymetric refinement.

Although EM leverages other attributes in the form of the related ancillary variable such as land-use classes in addition to their geometric footprints (Schroeder and Van Riper 2013), the resulting accuracy level in this study is low compared to different refined TDW approaches (Tables 1 to 3). This finding may relate to the assumption that each control zone (e.g., land use class) is associated with only one population density weight regardless of the possible underlying variability. The extended refinement approach for EM, on the other hand, addresses the issue to some degree, particularly when using parcels as the ancillary variable, but this improvement is not substantial when population counts are small (Table 2). Note that EM cannot incorporate the binary GHSL; nor can its data-extended approach implement refinements based on the grid-based NLCD and ZTRAX® datasets. While each implementation of the extended refinement approach for EM is outperformed by its counterpart refined TDW version, there is a trend indicating that the parcel-based extended approach might perform better over longer time periods (Zoraghein and Leyk 2018a). This trend may become even more pronounced if the issue of spatial non-stationarity in the assigned population density weights can be addressed effectively.

Refined TDW using building footprints and the ZTRAX® data performs generally well across different demographic attributes and time periods (Tables 1 to 3). The performance of ZTRAX® is promising for future studies because the dataset is available (with limitations) and relatively consistent, nationally, and for long time periods. Accordingly, first derived data products have been created, recently (Leyk and Uhl 2018). However, the gains in using ZTRAX® are less significant for small enumerated counts and over short time periods compared to publically available national (NLCD) and global (GHSL) data layers, suggesting higher benefits when the analysis covers longer time periods.

The mean percent errors in Tables 1 to 3 indicate that the areal interpolation methods tend to overestimate the population estimates at the target tract level for the longer time period. However, TDW and its different variations of refinement are almost generally unbiased over the shorter time period, indicating that the underlying assumptions are robust. According to Tables 1 and 3, the best-performing methods based on the absolute error measures also rank relatively high with respect to the mean percent error and mean absolute percent error measures, documenting the overall good performance of those methods for both heavily- and sparsely-populated tracts.

Absolute error maps of the selected demographic attributes (Figures 4 and 6) visualize the improvement effect of the most accurate refined methods over TDW. In addition, Figures 8 and 9 show the same patterns based on normalized absolute errors, with the advantage that error estimates can be compared over time and across regions.

Figures 5 and 7 provide a comparison between estimated demographic attributes using the best-performing methods at the target zone level in 1990 and 2000 to those from unrefined TDW and those from block aggregation (considered the evaluation baseline). These maps are unique fine-resolution depictions of changes in demographic attributes within consistent units over 10 and 20 years, respectively. This time-series of demographic evolution represents the major outcome of this multi-temporal estimation framework, improving the estimation accuracy of recent efforts (e.g. Schroeder 2007; Logan, Xu, and Stults 2014). The validation of modeling alternative demographic attributes provides insights into the potential of extending this framework to other attributes at the tract level that are not available at the block level and thus for the analysis of an extensive range of demographic sub-populations.

6.2. Multi-temporal Estimates of Urban Population

While the problem of modeling urban population in and of itself is extremely complex, the majority of the conclusions drawn above, can also be applied here. However, additional findings specific to the urban population estimation merit attention, which provide important insights into the potential for a more general conceptual framing of the term urban and the understanding of urban population and urban lands, pointing toward open research frontiers in urban geography and demography.

Refinement using census-defined urban areas in 1990, 2000 and 2010 does not generally reduce the selected error measures at the tract-level in comparison to the unrefined TDW (although it reduces mean percent error and mean absolute percent error for 1990–2010 substantially, which may indicate that it leads to less biased estimates, and is more effective for sparsely-populated target tracts and the longer time period), as indicated by Tables 4 and 5. This suggests that the underlying statistical surface of urban population is not well represented by these areas, justifying the need for their further dasymetric refinement (Zoraghein and Leyk 2018b). This observation may be due to the dichotomy between the two concepts of urban population and urban land. The U.S. Census, especially in 2000 and 2010, employs land-use types that have urban characteristics, such as commercial and industrial districts that may have no residential population at all, for delineating urban lands, whereas urban population is a subset of total residential population. Thus, using census-defined urban areas as a spatial delimiter in areal interpolation overestimates the spatial distribution of urban population in some areas.

When census-defined urban areas are further refined by another ancillary variable (i.e., a composite dasymetric refinement), associated error measures are generally lower, indicating that further refinement results in a more representative spatial distribution of urban population as a subset of total residential population. Even if no official urban areas existed, the employment of individual ancillary variables would result in improved urban population estimates in comparison to the regular methods, illustrated by lower absolute error measures in the lower parts of Tables 4 and 5. However, the mean percent error and mean absolute percent error measures for the 1990–2010 time period in Table 4 are high when individual ancillary variables are used without a spatial delimiter. One possible explanation is that census-defined urban areas in 1990 are not as inclusive as those in 2000 and 2010. Not constraining individual ancillary variables to be within census-defined urban areas can cause greater deviances in urban population estimates resulting from the areal interpolation. This leads to a general over-estimation of urban population at the target tract level, particularly for sparsely-populated tracts.

According to Tables 4 and 5, absolute error measures for estimating urban population using TDW and EM refined by parcels are exceedingly high, probably because parcel units include additional land-use types, which in turn aggravates the overestimation issue. However, the extended refinement approach mitigates this issue for EM to some degree.

Figure 10 illustrates gradual absolute error reductions (as shown in Tables 4 and 5) within target zones in 1990 and 2000. Figure 11 illustrates how urban population estimates and block-aggregated urban population values for 1990 and 2000 compare to each other and provides a unique depiction of urbanization trends from 1990 to 2000 at the target zone level, which can be extended to 2010. Such trends and multi-temporal patterns will be of great use to better understand key demographic processes such as urban sprawl. However, a remaining limitation is that these estimates assume compatibility of underlying urban definitions used in different census years and thus may be biased.

Nonetheless, these interpretations and limitations represent a promising starting point for in-depth analysis of the spatial and contextual measures (connectivity, adjacency or distance to urban centers) of urban land and urban population optimized for each point in time based on the ancillary variables to further improve models of urban change. These insights will benefit future analytical efforts to better model, understand and forecast urbanization processes over long time periods that can be extended to data-poor regions of the world.

7. Conclusions

This study shows how dasymetric refinement can improve the accuracy of regular areal interpolation methods in constructing time-series of demographic estimates within consistent small area census units, a persistent problem in different disciplines dealing with enumerated data. The ability to model different demographic attributes with acceptable accuracy allows the analyst to generate micro-scale compositional patterns of demographic changes more broadly. The analyses described here can be replicated for applications involving longer time periods and data-poor regions, where the availability of fine-resolution census units such as blocks is limited, and provide expectations of the inherent estimation error where no data are available for validation.

This study provides a critical comparison between regular and refined interpolation methods and evaluates the results using different ancillary variables for dasymetric refinement over different time periods. If repeated over more heterogeneous demographic attributes and longer time periods, experimental rules may be established related to the necessity of dasymetric refinement and types of ancillary variables that have the greatest potential under certain conditions. Current results show that integrating dasymetric refinement into areal interpolation, given the data and analysis demands, has the greatest potential for applications involving demographic attributes with high enumerated counts and long time periods. Otherwise, the inclusion of dasymetric refinement or the type of the ancillary variable does not lead to noticeable accuracy gains.

Future research will expand the extent of the analyses both temporally and spatially to attain robust and conclusive insights into the applicability and performance of various methods. Moreover, dasymetric refinement approaches will be further improved by tailoring them to specific demographic attributes. Finally, the successful interpolation of urban population will be further extended to develop data-driven approaches to establish metrics and measures that can be used to build consistent definitions of the concepts of urban land and population through time. This will be an important milestone for critical study of urbanization dynamics in the U.S. and in data-poor regions of the world.

Acknowledgements

Research reported in this publication was, in part, supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Award Number P2CHD066613. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The work was also funded, in part, by the US National Science Foundation award #1416860 to the City University of New York, the Population Council, the National Center for Atmospheric Research and the University of Colorado at Boulder as well as through support by Earth Lab and the CU Boulder Grand Challenge. Finally, the authors were provided access to the Zillow Transaction and Assessment Dataset (ZTRAX) through a data use agreement between the University of Colorado Boulder and Zillow Inc. Support by Zillow Inc. is gratefully acknowledged.

Appendix

1. Methods

1.1. Areal Weighting (AW)

AW estimates population in the source year for target zones based on overlapping areas between source and target zones (i.e., intersections or “atoms”). An underlying assumption is that population is uniformly distributed within a source zone (Equation 1):

popst=(AreastAreas)×pops (1)

Areast is the overlapping area between source zone s and target zone t, Areas is the source zone area, pops is the source zone population, and popst is the population assigned to atom st. The population of target zone t is then simply calculated by summing population counts of all atoms within it.

1.2. Refined AW

Dasymetrically refining source zones prior to areal interpolation entails the assumption of AW be modified. Accordingly, population is homogeneously distributed within refined areas of a source zone, and no population is assigned elsewhere. This assumption is expected to be more realistic and allows more precise reapportionment of population.

popst=(Ref_AreastRef_Areas)×pops (2)

Ref_Areast is the refined area of atom st, and Ref_Areas is the refined area of source zone s.

1.3. Target Density Weighting (TDW)

TDW makes two assumptions. First, within a source zone, the population distribution in the source year among atoms is assumed to be proportionally the same as its distribution in the target year. The second assumption states that the population density of any atom in the target year equals the density of the corresponding target zone:

zstAreast=ztAreat (3)

Where zst and zt indicate the population values of atom st and target zone t in the target year, respectively. Then, the population of target zone t in the source year is estimated as follows:

yt=syst=s(Areast/Areat)×ztτ(Areasτ/Areaτ)×zτ×ys (4)

Where yst, ys and yt are the population values of atom st, source zone s and target zone t in the source year. The term τ is a target zone index, independent of t, which is defined for each target zone intersecting source zone s.

1.4. Refined TDW

Refined TDW uses the same equations as TDW but applies them to refined areas of both source and target zones. This refinement necessitates that the underlying assumptions of unrefined TDW be modified. First, refined areas within atoms in both years, as well as those within source and target zones are derived. Then, it is assumed that ratios of refined population densities of atoms to refined population densities of source zones remain the same in both years, and the refined population density in the target year of any atom equals the refined population density of the corresponding target zone. Finally, population of target zones in the source year can be derived by aggregating population values of atoms within them.

1.5. Expectation Maximization (EM)

EM provides a robust framework for model fitting and maximum likelihood estimation in settings of incomplete data. First, the expectation (E) step “completes” the data by computing the conditional expectation for missing data, given a set of observed data and estimated model parameters. The maximization (M) step then fits the model, estimating model parameters by maximum likelihood given the “complete” data from the E step. A feedback loop between E and M steps is established and repeated until convergence.

In the E step, the algorithm estimates ysc^, i.e., the population count of the intersection area between source zone s and control zone c:

ysc^=ys(λc^Asckλk^Ask) (5)

Control zones are defined by the related ancillary variable, and are the ones that share the same attribute. For example, if parcels with housing characteristics are used, each control zone represents all parcels that have the same housing characteristic such as single-family residential or apartment. In Equation 5, ys is the population count of source zone s, λc^ is the estimated density of control zone c, Asc is the intersection area between s and c, and k is a second control zone index, independent of c to reflect all control zones intersecting s. The first E step is essentially similar to AW and assumes equal weights for all control zones. Then, the M step re-estimates all λc values using the equation below:

λc^=sysc^Ac (6)

Estimates of λc^ from the M step are used to re-estimate ysc^ in the next E step, which is followed by another M step, and so on until the system converges. The algorithm stops when the maximum absolute difference between current population density weights of control zones and those calculated from the previous run is less than 0.001. Finally, ysc^ values are used to calculate the population count of target zone t (yt^):

yt^=sc(Atscysc^)Asc (7)

Where Atsc is the intersection area between target zone t, source zone s, and control zone c.

1.6. Extended Refinement Approach for EM

EM assumes that population density is constant within each control zone. However, this assumption can become problematic if underlying records of control zones such as individual parcels are very different in an attribute such as area (e.g., single-family residential) or unit density (e.g., condominiums). To tackle this shortcoming, the extended refinement approach for EM enhances the dasymetric refinement step in EM (Zoraghein and Leyk 2018a). It first selects the most frequent control zones whose records altogether constitute more than 90% of entire records and then categorizes them into more homogeneous sub-control zones based on similarities in a given attribute such as area or unit density (the number of categories is selected from 5 to 7). The algorithm does not alter the other less frequent control zones. Finally, it carries out EM on the new set of control zones formed by newly-defined more homogeneous sub-control zones and the ones that remained unchanged. The extended refinement approach for EM uses simulation to find the optimum number of selected control zones for categorization and the number of categories per control zone (Zoraghein and Leyk 2018a).

2. Estimation Results for the Temporal Interpolation of Total Population

Figure 1.

Figure 1.

Absolute error maps of total population estimates at the target zone level: (a) in 1990 based on TDW, (b) in 1990 based on TDW refined by buildings, (c) in 2000 based on TDW and (d) in 2000 based on TDW refined by ZTRAX®4.

Figure 2.

Figure 2.

Normalized absolute error maps of total population estimates at the target zone level: (a) in 1990 based on TDW, (b) in 1990 based on TDW refined by buildings, (c) in 2000 based on TDW and (d) in 2000 based on TDW refined by ZTRAX®5.

Figure 3.

Figure 3.

Total population maps at the target zone level: (a) in 1990 based on TDW, (b) in 1990 based on TDW refined by buildings, (c) in 1990 based on block aggregation, (d) in 2000 based on TDW, (e) in 2000 based on TDW refined by ZTRAX®6, and (f) in 2000 based on block aggregation.

Table A1.

Absolute error measures pertaining to total population estimates.

MAE Median Abs Error RMSE 90th Percentile Abs Error Ancillary Variable
Method Time Period: 1990–2010
AW 376 66 842 1252 None
TDW 162 75 312 388 None
RefTDW 168 64 327 454 NLCD
RefTDW 136 67 239 336 GHSL
RefTDW 143 75 247 359 NLCD-GHSL
RefTDW 131 57 260 313 Parcels
RefTDW 101 49 207 232 Buildings
RefTDW 104 44 246 240 ZTRAX7
EM 174 45 377 531 NLCD
EM 353 77 773 1141 Parcels
EM 154 50 346 409 Buildings
EM 162 49 391 438 ZTRAX
ExtendedEM 158 51 386 387 Parcels
ExtendedEM 147 48 334 368 Buildings
Time Period: 2000–2010
AW 340 11 1449 1084 None
TDW 55 12 135 150 None
RefTDW 47 7 129 127 NLCD
RefTDW 55 8 151 168 GHSL
RefTDW 51 7 143 154 NLCD-GHSL
RefTDW 51 6 171 129 Parcels
RefTDW 43 5 150 97 Buildings
RefTDW 41 5 144 89 ZTRAX
EM 142 7 447 366 NLCD
EM 191 6 621 569 Parcels
EM 87 5 300 229 Buildings
EM 92 4 320 238 ZTRAX
ExtendedEM 96 5 355 247 Parcels
ExtendedEM 82 5 281 218 Buildings

Footnotes

1

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

2

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

3

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

4

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

5

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

6

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

7

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

8

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

9

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

10

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

11

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

12

The dataset was provided by Zillow Inc. (https://www.zillow.com/research/data/), and the source code for areal interpolation using it can be accessed online at Zoraghein (2018) or upon request from the corresponding author.

References

  1. Bakillah Mohamed, Liang Steve, Mobasheri Amin, Jamal Jokar Arsanjani, and Alexander Zipf. 2014. “Fine-Resolution Population Mapping Using OpenStreetMap Points-of-Interest.” International Journal of Geographical Information Science 28 (9): 1940–63. doi: 10.1080/13658816.2014.909045. [DOI] [Google Scholar]
  2. BostonGIS. 2017. “PARCELS 2016 DATA FULL.” Accessed February 10 http://bostonopendata-boston.opendata.arcgis.com/datasets/f3d274161b4a47aa9acf48d0d04cd5d4_0.
  3. Buttenfield BP, Ruther M, and Leyk S. 2015. “Exploring the Impact of Dasymetric Refinement on Spatiotemporal Small Area Estimates.” Cartography and Geographic Information Science 42 (5): 449–59. doi: 10.1080/15230406.2015.1065206. [DOI] [Google Scholar]
  4. Calka Beata, Bielecka Elzbieta, and Zdunkiewicz Katarzyna. 2016. “Redistribution Population Data across a Regular Spatial Grid according to Buildings Characteristics.” Geodesy and Cartography 65 (2): 149–62. [Google Scholar]
  5. Dempster AP, Laird NM, and Rubin DB. 1977. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society. Series B (Methodological) 39 (1): 1–38. http://www.jstor.org/stable/2984875. [Google Scholar]
  6. Department of Commerce. 2011. “Federal Register.” Federal Register 76 (164): 53030–43. [Google Scholar]
  7. Dmowska Anna, and Stepinski Tomasz F. 2017. “A High Resolution Population Grid for the Conterminous United States: The 2010 Edition.” Computers, Environment and Urban Systems 61 Elsevier: 13–23. [Google Scholar]
  8. Dong Pinliang, Ramesh Sathya, and Nepali Anjeev. 2010. “Evaluation of Small-Area Population Estimation Using LiDAR, Landsat TM and Parcel Data.” International Journal of Remote Sensing 31 (21): 5571–86. doi: 10.1080/01431161.2010.496804. [DOI] [Google Scholar]
  9. Eicher Cory L., and Brewer Cynthia A.. 2001. “Dasymetric Mapping and Areal Interpolation: Implementation and Evaluation.” Cartography and Geographic Information Science 28 (2): 125–38. doi: 10.1559/152304001782173727. [DOI] [Google Scholar]
  10. Flowerdew Robin, and Green Mick. 1994. “Areal Interpolation and Types of Data” In Spatial Analysis and GIS, edited by Fotheringham Stewart and Rogerson Peter, 121–45. London: Taylor & Francis. [Google Scholar]
  11. Geiß Christian, Anne Schauß Torsten Riedlinger, Dech Stefan, Zelaya Cecilia, Nicolás Guzmán, Hube Mathías A, Jamal Jokar Arsanjani, and Hannes Taubenböck. 2016. “Joint Use of Remote Sensing Data and Volunteered Geographic Information for Exposure Estimation: Evidence from Valparaíso, Chile.” Natural Hazards. Springer, 1–25. [Google Scholar]
  12. Goodchild MF, and Lam NSN. 1980. “Areal Interpolation: A Variant of the Traditional Spatial Problem.” Geo-Processing 1: 297–312. [Google Scholar]
  13. Homer C, Dewitz J, Fry J, Coan M, Hossain N, Larson C, Herold N, McKerrow A, VanDriel JN, and Wickham J. 2007. “Completion of the 2001 National Land Cover Database for the Conterminous United States.” Photogrammetric Engineering and Remote Sensing 73 (4): 337–41. [Google Scholar]
  14. Homer CG, Dewitz JA, Yang L, Jin S, Danielson P, Xian G, Coulston J, Herold ND, Wickham JD, and Megown K. 2015. “Completion of the 2011 National Land Cover Database for the Conterminous United States-Representing a Decade of Land Cover Change Information.” Photogrammetric Engineering and Remote Sensing 81 (5): 345–54. doi: 10.14358/PERS.81.5.345. [DOI] [Google Scholar]
  15. Jia Peng, and Gaughan Andrea E.. 2016. “Dasymetric Modeling: A Hybrid Approach Using Land Cover and Tax Parcel Data for Mapping Population in Alachua County, Florida.” Applied Geography 66 (January): 100–108. doi: 10.1016/j.apgeog.2015.11.006. [DOI] [Google Scholar]
  16. Jia Peng, Qiu Youliang, and Gaughan Andrea E.. 2014. “A Fine-Scale Spatial Population Distribution on the High-Resolution Gridded Population Surface and Application in Alachua County, Florida.” Applied Geography 50: 99–107. doi: 10.1016/j.apgeog.2014.02.009. [DOI] [Google Scholar]
  17. Kar Bandana, and Hodgson Michael E. 2012. “A Process Oriented Areal Interpolation Technique: A Coastal County Example.” Cartography and Geographic Information Science 39 (1): 3–16. doi: 10.1559/152304063913. [DOI] [Google Scholar]
  18. Langford Mitchel. 2013. “An Evaluation of Small Area Population Estimation Techniques Using Open Access Ancillary Data.” Geographical Analysis 45 (3): 324–44. doi: 10.1111/gean.12012. [DOI] [Google Scholar]
  19. Leyk S, Buttenfield Barbara P., Nagle Nicholas N., and Stum Alexander K.. 2013. “Establishing Relationships between Parcel Data and Land Cover for Demographic Small Area Estimation.” Cartography and Geographic Information Science 40 (4): 305–15. doi: 10.1080/15230406.2013.782682. [DOI] [Google Scholar]
  20. Leyk S, Ruther M, Buttenfield BP, Nagle NN, and Stum AK. 2014. “Modeling Residential Developed Land in Rural Areas: A Size-Restricted Approach Using Parcel Data.” Applied Geography 47 (February): 33–45. doi: 10.1016/j.apgeog.2013.11.013. [DOI] [Google Scholar]
  21. Leyk S, and Johannes H Uhl. 2018. “Historical Fine-Grained Settlement Layers for the Conterminous United States over 200 Years.” Under Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Leyk S, Uhl Johannes H., Deborah Balk, and Bryan Jones. 2018. “Assessing the Accuracy of Multi-Temporal Built-up Land Layers across Rural-Urban Trajectories in the United States.” Remote Sensing of Environment 204 (January): 898–917. doi: 10.1016/j.rse.2017.08.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Linard Catherine, Gilbert Marius, and Tatem Andrew J.. 2011. “Assessing the Use of Global Land Cover Data for Guiding Large Area Population Distribution Modelling.” GeoJournal 76 (5): 525–38. doi: 10.1007/s10708-010-9364-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Logan John R., Stults Brian J., and Xu Zengwang. 2016. “Validating Population Estimates for Harmonized Census Tract Data, 2000–2010.” Annals of the American Association of Geographers 106 (5). Taylor & Francis: 1013–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Logan John R., Xu Zengwang, and Stults Brian J.. 2014. “Interpolating U.S. Decennial Census Tract Data from as Early as 1970 to 2010: A Longitudinal Tract Database.” The Professional Geographer 66 (3): 412–20. doi: 10.1080/00330124.2014.905156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lu Zhenyu, Im Jungho, Quackenbush Lindi, and Halligan Kerry. 2010. “Population Estimation Based on Multi-Sensor Data Fusion.” International Journal of Remote Sensing 31 (21): 5587–5604. doi: 10.1080/01431161.2010.496801. [DOI] [Google Scholar]
  27. Lung Tobias, Tillmann Lübker James K. Ngochoch, and Schaab Gertrud. 2013. “Human Population Distribution Modelling at Regional Level Using Very High Resolution Satellite Imagery.” Applied Geography 41: 36–45. doi: 10.1016/j.apgeog.2013.03.002. [DOI] [Google Scholar]
  28. Maantay JA, Maroko Andrew R., and Christopher Herrmann. 2007. “Mapping Population Distribution in the Urban Environment: The Cadastral-Based Expert Dasymetric System (CEDS).” Cartography and Geographic Information Science 34 (2): 77–102. [Google Scholar]
  29. Manson Steven, Schroeder Jonathan P., Van Riper David C., and Steven Ruggles. 2017. “IPUMS National Historical Geographic Information System: Version 12.0 [Database]” Minneapolis: University of Minnesota. doi: 10.18128/D050.V12.0. [DOI] [Google Scholar]
  30. MassGIS. 2017a. “MassGIS Data - Building Structures.” Office of Geographic Information (MassGIS), Commonwealth of Massachusetts, MassIT. Accessed February 10 http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/datalayers/structures.html.
  31. MassGIS. 2017b. “MassGIS Data Download - Level 3 Assessor’s Parcels.” Office of Geographic Information (MassGIS), Commonwealth of Massachusetts, MassIT. Accessed February 10 http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/datalayers/download-level3-parcels.html.
  32. Mennis J 2003. “Generating Surface Models of Population Using Dasymetric Mapping.” The Professional Geographer 55 (1): 31–42. [Google Scholar]
  33. MassGIS. 2009. “Dasymetric Mapping for Estimating Population in Small Areas.” Geography Compass 3 (2): 727–45. doi: 10.1111/j.1749-8198.2009.00220.x. [DOI] [Google Scholar]
  34. Mitsova Diana, Esnard Ann-Margaret, and Li Yanmei. 2012. “Using Enhanced Dasymetric Mapping Techniques to Improve the Spatial Accuracy of Sea Level Rise Vulnerability Assessments.” Journal of Coastal Conservation 16 (3). Springer: 355–72. [Google Scholar]
  35. Pesaresi Martino, Ehrlich Daniele, Ferri Stefano, Florczyk Aneta, Freire Sergio, Halkia Matina, Julea Andreea, Kemper Thomas, Soille Pierre, and Syrris Vasileios. 2016. “Operating Procedure for the Production of the Global Human Settlement Layer from Landsat Data of the Epochs 1975, 1990, 2000, and 2014.” JRC Technical Report; European Commission, Joint Research Centre, Institute for the Protection and Security of the Citizen: Ispra, Italy. [Google Scholar]
  36. Qiu Fang, Sridharan Harini, and Chun Yongwan. 2010. “Spatial Autoregressive Model for Population Estimation at the Census Block Level Using LIDAR-Derived Building Volume Information.” Cartography and Geographic Information Science 37 (3). Taylor & Francis: 239–57. [Google Scholar]
  37. Reibel Michael, and Agrawal Aditya. 2007. “Areal Interpolation of Population Counts Using Pre-Classified Land Cover Data.” Population Research and Policy Review 26 (5–6): 619–33. doi: 10.1007/s11113-007-9050-9. [DOI] [Google Scholar]
  38. Reibel Michael, and Bufalino Michael E. 2005. “Street-Weighted Interpolation Techniques for Demographic Count Estimation in Incompatible Zone Systems.” Environment and Planning A 37 (1): 127–39. doi: 10.1068/a36202. [DOI] [Google Scholar]
  39. Ruther M, Leyk S, and Buttenfield BP. 2015. “Comparing the Effects of an NLCD-Derived Dasymetric Refinement on Estimation Accuracies for Multiple Areal Interpolation Methods.” GIScience & Remote Sensing 52 (2): 158–78. [Google Scholar]
  40. Sahar Liora, Muthukumar Subrahmanyam, and French Steven P.. 2010. “Using Aerial Imagery and Gis in Automated Building Footprint Extraction and Shape Recognition for Earthquake Risk Assessment of Urban Inventories.” IEEE Transactions on Geoscience and Remote Sensing 48 (9): 3511–20. doi: 10.1109/TGRS.2010.2047260. [DOI] [Google Scholar]
  41. Schroeder Jonathan P. 2017. “Hybrid Areal Interpolation of Census Counts from 2000 Blocks to 2010 Geographies.” Computers, Environment and Urban Systems 62 Elsevier: 53–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Schroeder Jonathan P. 2007. “Target Density Weighting Interpolation and Uncertainty Evaluation for Temporal Analysis of Census Data.” Geographical Analysis 39 (3): 311–35. doi: 10.1111/j.1538-4632.2007.00706.x. [DOI] [Google Scholar]
  43. Schroeder Jonathan P., and Van Riper David C.. 2013. “Because Muncie’s Densities Are Not Manhattan’s: Using Geographical Weighting in the EM Algorithm for Areal Interpolation.” Geographical Analysis 45 (3): 216–37. doi: 10.1111/gean.12014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sridharan Harini, and Qiu Fang. 2013. “A Spatially Disaggregated Areal Interpolation Model Using Light Detection and Ranging-Derived Building Volumes.” Geographical Analysis 45 (3): 238–58. doi: 10.1111/gean.12010. [DOI] [Google Scholar]
  45. Su Ming Dawa, Mei Chun Lin, Hsieh Hsin I., Bor Wen Tsai, and Chun Hung Lin. 2010. “Multi-Layer Multi-Class Dasymetric Mapping to Estimate Population Distribution.” Science of the Total Environment 408 (20): 4807–16. doi: 10.1016/j.scitotenv.2010.06.032. [DOI] [PubMed] [Google Scholar]
  46. Tapp Anna F. 2010. “Areal Interpolation and Dasymetric Mapping Methods Using Local Ancillary Data Sources.” Cartography and Geographic Information Science 37 (3): 215–28. doi: 10.1559/152304010792194976. [DOI] [Google Scholar]
  47. U.S. Census Bureau. 2017a. “American FactFinder - Download Center.” Accessed February 10 https://factfinder.census.gov.
  48. U.S. Census Bureau. 2017b. “TIGER/Line Shapefiles and TIGER/Line Files.” Census.gov, Maps and Data, TIGER Products Accessed February 10 https://www.census.gov/geo/maps-data/data/tiger-line.html.
  49. U.S. Census Bureau. 2011. “Differences Between the Census 2000 and 2010 Census Urban Area Criteria.” http://www2.census.gov/geo/pdfs/reference/ua/2000_2010uadif.pdf.
  50. Ural Serkan, Hussain Ejaz, and Shan Jie. 2011. “Building Population Mapping with Aerial Imagery and GIS Data.” International Journal of Applied Earth Observation and Geoinformation 13 (6): 841–52. doi: 10.1016/j.jag.2011.06.004. [DOI] [Google Scholar]
  51. Vogelmann James E, Howard Stephen M, Limin Yang, Larson Charles R, Wylie Bruce K, and Van Driel Nick. 2001. “Completion of the 1990s National Land Cover Data Set for the Conterminous United States from Landsat Thematic Mapper Data and Ancillary Data Sources.” Photogrammetric Engineering and Remote Sensing 67 (6): 650–62. [Google Scholar]
  52. Wright JK. 1936. “A Method of Mapping Densities of Population: With Cape Cod as an Example.” Geographical Review 26 (1): 103–10. [Google Scholar]
  53. Wu Shuo-sheng, Wang Le, and Qiu Xiaomin. 2008. “Incorporating GIS Building Data and Census Housing Statistics for Sub-Block-Level Population Estimation.” The Professional Geographer 60 (1). Taylor & Francis: 121–35. [Google Scholar]
  54. Xie Yanhua, Weng Anthea, and Weng Qihao. 2015. “Population Estimation of Urban Residential Communities Using Remotely Sensed Morphologic Data.” IEEE Geoscience and Remote Sensing Letters 12 (5). IEEE: 1111–15. [Google Scholar]
  55. Zandbergen Paul A. 2011. “Dasymetric Mapping Using High Resolution Address Point Datasets.” Transactions in GIS 15 (SUPPL. 1): 5–27. doi: 10.1111/j.1467-9671.2011.01270.x. [DOI] [Google Scholar]
  56. Zandbergen Paul A, and Ignizio Drew A. 2010. “Comparison of Dasymetric Mapping Techniques for Small Area Population Estimates.” Cartography and Geographic Information Science 37 (3): 199–214. [Google Scholar]
  57. Zillow. 2017. “Zillow Data.” https://www.zillow.com/research/data/.
  58. Zoraghein H 2018. “Areal Interpolation Using Zillow Code.” Harvard Dataverse. doi:doi/10.7910/DVN/KEQQP1. [Google Scholar]
  59. Zoraghein H, and Leyk S. 2018a. “Enhancing Areal Interpolation Frameworks through Dasymetric Refinement to Create Consistent Population Estimates across Censuses.” International Journal of Geographical Information Science, 1–29. doi: 10.1080/13658816.2018.1472267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Zoraghein H, and Leyk S. 2018b. “Estimating Changes in Urban Land and Urban Population Using Refined Areal Interpolation Techniques.” In Proceedings of the ICA, 1:130. doi: 10.5194/ica-proc-1-130-2018. [DOI] [Google Scholar]
  61. Zoraghein H, Leyk S, Ruther M, and Buttenfield BP. 2016. “Exploiting Temporal Information in Parcel Data to Refine Small Area Population Estimates.” Computers, Environment and Urban Systems 58: 19–28. doi: 10.1016/j.compenvurbsys.2016.03.004. [DOI] [Google Scholar]

RESOURCES