The North Atlantic Population Project: Progress and Prospects

STEVEN RUGGLES; EVAN ROBERTS; SULA SARKAR; MATTHEW SOBEK

doi:10.1080/01615440.2010.515377

. Author manuscript; available in PMC: 2011 Dec 22.

Published in final edited form as: Hist Methods. 2011 Jan 1;44(1):1–6. doi: 10.1080/01615440.2010.515377

The North Atlantic Population Project

Progress and Prospects

STEVEN RUGGLES ¹, EVAN ROBERTS ¹, SULA SARKAR ¹, MATTHEW SOBEK ¹

PMCID: PMC3244724 NIHMSID: NIHMS334007 PMID: 22199411

Abstract

The North Atlantic Population Project (NAPP) is a massive database of historical census microdata from European and North American countries. The backbone of the project is the unique collection of completely digitized censuses providing information on the entire enumerated populations of each country. In addition, for some countries, the NAPP includes sample data from surrounding census years. In this article, the authors provide a brief history of the project, describe their progress to data and plans for the future, and discuss some potential implications of this unique data resource for social and economic research.

Keywords: census, database, demography, international, microdata

The North Atlantic Population Project (NAPP) emerged from a meeting of the International Microdata Access group hosted by Lisa Dillon at the University of Ottawa in April 1999. Steven Ruggles was excited to tell the participants about an agreement he had just reached with the Church of Jesus Christ of Latter-Day Saints (LDS) to gain access to their genealogical database of the entire census of the United States in 1880. Over 1,000 Mormon volunteers had devoted some 2,000,000 hours over 18 years to digitally transcribe the entire U.S. census, comprising more than 50 million records. The LDS was interested in creating a genealogical product on CD-ROM and sought academic collaborators to improve the data and speed up their release schedule. When Ruggles offered to assist the LDS in cleaning the data in exchange for permission to distribute the database freely for academic use, the LDS readily agreed.

As soon as Ruggles related this triumphant news, Dillon revealed that she, too, had been in touch with the LDS and that she anticipated an agreement that would give access to the 1881 census of Canada. Another participant, Matthew Woollard of the UK Data Archive, was already working on the LDS transcription of the 1881 census of Britain. Gunnar Thorvaldsen of the Norwegian Historical Data Centre at the University of Tromsø then announced that his digital transcription of the 1865 and 1900 censuses of Norway—a project he had pursued for many years—was nearly complete.

The group quickly realized that they had an extraordinary opportunity to combine these massive data sets and create a crossnational database for the late-nineteenth century with almost 90 million observations. In June 2000, we met at the newly established Minnesota Population Center in Minneapolis to define the goals of the project and develop a plan of work. The participants agreed that we should not simply create compatible data sets, but rather should develop a single fully integrated database with common coding systems, constructed variables, documentation, and a single dissemination system.

The concept of NAPP is that a comparatively small investment in collaboration across countries and in shared processing and dissemination can leverage the resources in each country can ensure that the full potential of the data is realized. The project reduces barriers to comparative research by preserving data sets and making them freely available, converting them into a uniform format with standardized machine-actionable metadata, providing comprehensive documentation, and implementing powerful Web-based tools for dissemination and analysis. This is a rare success story of large-scale international collaboration. Producing a single coherent database with staff and funding scattered across a dozen institutions on two continents requires continuous intensive communication and negotiation.

Much of our initial work involved developing compatible coding systems, especially for occupations. We obtained a modest grant to fund our meetings, which we held in suitable North Atlantic sites in Iceland, Norway, England, and Canada. After considerable debate and some controversy, we settled on a modified version of the Historical International Standard Classification of Occupations (also known as HISCO), and developed procedures to maximize consistency across countries (Roberts et al. 2003). The merged NAPP coding dictionaries are of unprecedented scale, since they include all alphabetic strings from all participating countries. The occupation dictionary, for example, so far includes 2,605,301 unique strings.

We released early versions of the 1880 U.S. and 1881 Canadian data sets in 2003 and data for 1881 England and Wales and 1900 Norway the following year. In 2006, we added complete data for 1865 Norway, 1881 Scotland, and samples of the Canadian censuses of 1871 and 1901. Since then, we have released a complete census of Sweden and samples of Norway in 1875, Britain in 1851, Mecklenburg, Germany, in 1819, and several U.S. data sets from between 1850 and 1910. We also plan to release additional data sets from Iceland and Canada later this year.

Table 1 summarizes the NAPP data sets in the current project phase and our future plans. The complete enumerations are shown in bold; the other data sets are nationally representative samples. The lower panel of table 1 shows our ambitious planned expansion, which will dramatically increase both the chronological and geographic scopes of the project. Over the coming seven years, we propose to add more than 250 million new person records to the database, more than tripling its size. This massive expansion capitalizes on an explosion of new genealogical data collection projects. The complete-count data from Canada and the United States derive from new LDS projects; the data for England, Wales, and Scotland are being created by the commercial genealogical firm Findmypast.com. In Denmark, Ireland, and Sweden, the data derive from genealogical projects carried out by each country’s national archives.

TABLE 1.

Major Components of the North Atlantic Population Project

Year	Country	Cases (in thousands)
Current database
1852	Canada	170¹
1871	Canada	62²
1881	Canada	4,278
1891	Canada	350¹
1901	Canada	265¹
1851	England and Wales	376²
1881	England and Wales	26,125
1703	Iceland	50
1801	Iceland	47
1880	Iceland	72
1901	Iceland	78
1801	Norway	879
1865	Norway	1,702
1875	Norway	639¹
1900	Norway	2,294
1851	Scotland	22²
1881	Scotland	3,728
1900	Sweden	4,576
1850	United States	198²
1860	United States	354²
1870	United States	428²
1880	United States	50,486
1900	United States	5,220¹
1910	United States	1,271²
Total cases		103,670
Proposed additions
1918	Albania	140¹
1852	Canada	2,436
1861	Canada	3,230
1871	Canada	3,689
1891	Canada	4,833³
1901	Canada	5,371³
1911	Canada	372¹
1787	Denmark	842
1801	Denmark	929
1845	Denmark	1,357
1880	Denmark	1,969
1890	Denmark	2,172
1848	Egypt	463¹
1868	Egypt	618¹
1851	England and Wales	17,298
1861	England and Wales	20,066
1871	England and Wales	22,712
1891	England and Wales	29,003
1901	England and Wales	32,528
1911	England and Wales	36,070
1835	Iceland	56
1845	Iceland	57
1870	Iceland	60
1901	Ireland	4,459
1911	Ireland	4,390
1819	Germany	78⁴
1867	Germany	112⁴
1930	Mexico	1,655¹
1875	Norway	1,813
1910	Norway	2,294
1851	Scotland	2,889
1861	Scotland	3,062
1871	Scotland	3,360
1891	Scotland	4,026
1901	Scotland	4,472
1911	Scotland	4,761
1880	Sweden	4,566
1890	Sweden	4,785
1910	Sweden	5,522
1850	United States	23,192
Total cases		261,707

Open in a new tab

Sample: 5% or greater.

Sample: under 5%

LDS approval pending.

⁴

Duchy of Mecklenberg-Schwerin only, 20% sample.

In the current phase of the project, we have multiple complete enumerations from just two countries, Iceland and Norway. When the NAPP expansion is complete, we will have multiple complete enumerations from nine countries: Canada, Denmark, England and Wales, Iceland, Ireland, Norway, Scotland, Sweden, and the United States. For seven of these countries, we will have four or more complete enumerations.

In addition to the genealogical projects, the expansion will incorporate sample data that are being digitized for purely scientific purposes in Albania, Egypt, Germany, Norway, and Mexico. The new partners located on the periphery of the North Atlantic region are especially exciting because they offer different perspectives on the past. The observations from Albania and Egypt represent the first representative national historical microdata from the Islamic world. The sample from Mexico will be the most recent in the NAPP collection; it was taken in 1930, preceding the beginning of mass immigration of Mexicans into the United States.

Linked Data Sets

As soon as the complete-count census data became available, Joseph P. Ferrie and Jason Long got the idea of linking existing census samples to the complete enumerations to study migration and economic mobility. Their strategy was simple: they searched the complete-count data sets of England and Wales in 1881 and the United States in 1880 for men recorded in nearby census samples, matching on the characteristics that would not be expected to change over time: name, birth year, birthplace, and (in the United States) race. When more than a single match for a sampled person was found in the complete enumeration for 1880 or 1881, that individual was dropped from the linked set.

Analyses of these linked data sets have produced striking new findings that challenge established theories of social and economic change. Throughout the past century, theorists have argued that rising residential mobility has had a variety of adverse consequences, such as the loss of family cohesion, social dislocation, disrupted schooling, and health impairment (e.g., Parsons and Bales 1955; Litwak 1960; Astone and McLanahan 1994; Nicolopoulou-Stamati 2005). Increasing residential mobility is widely cited in policy debates, including debates on child immunization law (Centers for Disease Control and Prevention 2001), state inheritance taxes (Cooper 2006), child support enforcement (Department of Health and Human Services 2002), and grandparent visitation (Piekarsky 2004). We now know that residential mobility in the United States has actually been declining for well over a century and was far higher in the nineteenth century than it is today (Ferrie 2005; Goeken and Hall forthcoming; Hall and Ruggles 2004; Long 2005).¹ Accordingly, literature that explains social change as a consequence of rising mobility is simply based on a false premise.

The findings on occupational mobility are even more compelling. Social scientists on both sides of the Atlantic have long debated trends and differences in opportunities for economic mobility. Nineteenth-century observers such as Alexis de Tocqueville (1862) frequently remarked on the high levels of economic opportunity in the United States compared with Europe. Social historians writing in the 1960s and 1970s disagreed, pointing to rigid class divisions and limited potential for upward mobility (Thernstrom 1964, 1973; Katz, Doucet, and Stern 1982). The linked data sets made it possible to assess long-run trends in intergenerational mobility. It turns out that in the nineteenth century, the United States was a far more fluid society than was Britain, and rapid upward mobility was common. Over the course of the past 150 years, however, occupational mobility in the United States has declined dramatically while it has increased in Britain; today, there is little difference in economic mobility between the two countries (Long and Ferrie 2007).

During the past five years, we have developed new record-linkage and data-mining technology to expand and improve on the simple linked samples created by Long and Ferrie (Ruggles 2006; Pamarthy 2007; Goeken, Huynh, Lynch, and Vick 2011). Our strategies build on new research in probabilistic record linkage and machine-learning technology. Like Long and Ferrie, the NAPP linked samples rely exclusively on matching characteristics that should not change over time. The NAPP samples, however, do not require exact matches of characteristics; we link records on a probabilistic basis, allowing for imperfect correspondence of names and ages. Thus, for example, to compare names we use a string comparison algorithm that computes a similarity measure between 0.0 and 1.0 based on the number of common characters in two strings, the lengths of both strings, and the number of transpositions, accounting for the increased probability of typographical errors toward the end of words (Porter and Winkler 1997). We also use the New York State Identification and Intelligence System and Double-Metaphone phonetic name coding, which provide multiple encoded strings corresponding to variant pronunciations (Philips 2000; Lait and Randell 1993).

We then implement a machine-learning tool known as a Support Vector Machine (SVM) to classify each possible link (Vapnik 1998; Christianini and Shawe-Taylor 2000; Abe 2005). Based on the hand-linked training data, the SVM calculates a confidence score for every potential match; when one and no more than one potential match exceeds the threshold, we establish a link. Once we have established the full set of links, we weight the cases to represent the potentially link-able population with respect to age, sex, birthplace, whether related to head, occupational group, and size of place in the terminal year.

The linked NAPP data sets have several advantages over the linked sets produced by Long and Ferrie. First, our technology has allowed us to substantially reduce the rate of false links, which has significant implications for the rate of geographic and social mobility (Goeken and Hall forthcoming). Second, the NAPP weighted samples are also significantly more representative of the linkable population than were the early linked samples. Finally, unlike Long and Ferrie (2007), we link women (as long as they do not marry and change their surname in the interval), and we provide users with full information on all members of the linked individuals’ households.

We have released 28 sets of linked individuals and households for the United States and Norway, and we expect to complete linked sets for Britain and Iceland soon. The linked sets we have constructed to date match a census sample to a complete census enumeration. As we obtain multiple complete digital censuses for each country in coming years, we plan to link individuals through multiple censuses, thereby constructing longitudinal panels with multiple observations of each individual and family. To accomplish this, we will link successive pairs of censuses using the same general strategies we developed to link census pairs.

Linking multiple complete census enumerations will require significant innovations to accommodate the massive increase in the scale of processing. The complete enumerations contain 50–100 times the number of records as the samples we have been working with so far, and instead of just linking pairs of censuses, we will be linking up to seven censuses per country. This means the number of comparisons we must make could rise by a factor of approximately 500,000. Our current procedures already press at the limits of available computing capacity. We process the data on a high-performance computer at the Minnesota Supercomputing Institute for Advanced Computational Research (MSI). MSI has provided us with substantial amounts of supercomputing time at no cost to the project and will continue to do so in the next project phase. It would be impossible, however, to ramp up our supercomputer usage by 500,000-fold. We will therefore pursue a cluster of strategies to improve the computational efficiency of our procedures.

Spatiotemporal Analysis

NAPP is making a strategic contribution to demographic infrastructure by providing a baseline for study of changes in the demography and health of European and North American populations. In each country, NAPP provides the earliest national census microdata available. The NAPP database should not be regarded as a description of the population and the economy before it was affected by modern transformations. It is impossible to pinpoint a static time that precedes the modern world. In the period in which these censuses were taken, Europe and North America were already undergoing accelerating change. Indeed, the development of sophisticated statistical data in the eighteenth and nineteenth centuries can be seen as a byproduct of changes that were already underway. Nonetheless, these materials provide the earliest detailed statistical picture of Western society ever available. In many cases, that picture was taken before most people’s lives were transformed by urbanization, industrialization, and the shift to wage-labor employment.

The landscape of scientific research on the human population is shifting. It is no longer sufficient just to study the relationships among variables at a particular moment in time. Researchers around the world now recognize that to understand the large-scale processes that are transforming society, we must investigate long-term change. The goal of this project is to provide the kind of data that make such analysis possible.

A hint of the possibilities for comparative spatiotemporal analysis is provided by Ruggles’s (2009, 2010) work on the Northwest European family system. For the past four decades, demographers have argued that historic Northwest Europe and North America had a unique weak family system characterized by neolocal marriage and nuclear family structure (e.g., Laslett 1972; Hajnal 1982; Hareven 1994). This interpretation has become a central component of theories of economic development and gender relations, and analysts argue that the distinctive northwest European family has important implications for present-day demographic behavior (Macfarlane 1978, 1986; Cain and McNicoll 1988; Reher 1998; Hartmann 2004; Thornton 2005). NAPP data allow us for the first time to place historical Northwest Europe and North America in the context of comparative data from around the world. The results provide no support for the theory of a distinctive northwest-European preference for nuclear families: with respect to residence with kin and in intergenerational families, nineteenth-century Europe and North America closely resemble other places with similar demographic composition (Ruggles 2009). When it comes to joint families in which an elder resides with multiple married offspring, however, there are systematic regional and temporal patterns that cannot be so easily explained by compositional factors (Ruggles 2010).

The nation-level Ruggles studies only suggests the power of these data; it does not exploit their full potential. NAPP will allow a new class of studies of the spatial organization of human activity, new measures of residential segregation, and innovative analyses of the effects of local context at multiple scales on individual behavior. Most historical studies of the impact of local conditions on individual behavior have focused on a single community or on a small number of communities (e.g., Anderson 1972; Hareven 1978; Katz 1975; Foster 1974; Modell 1978; Ruggles 1987; Janssens 1993). Because they are limited to single communities, such studies cannot generalize about the effects of context on behavior. NAPP provides data on the entire populations of tens of thousands of communities. That contextual data may be combined with the linked longitudinal panels to assess the ways in which individual and family transitions were conditioned by local, regional, and national opportunities and constraints.

Complete-count data are vital for spatial and multilevel analysis. Economic development across Europe and North America in the long nineteenth century was highly uneven; in some places, daily life changed little from the preceding centuries, and in others, it was transformed beyond recognition. This great variation allows us to assess the impact of local economic and demographic characteristics on individual behavior, thereby offering the potential for understanding the consequences of early industrial and commercial development.

The existing NAPP data are invaluable, but because they usually provide just a single detailed snapshot, they lack the power to reveal processes of change. The future availability of multiple complete-count cross-sections for the population of the North Atlantic world in the nineteenth and early twentieth centuries will open up vast new terrain in the study of industrialization, urbanization, demographic transition, and mass international population movements. NAPP will be a vital resource for studying the transformation of nuptiality, female labor-force participation, life-cycle service, mass education, indigenous populations, oldest-old populations, international migrations, and life-course transitions to adulthood. For each of these topics, the collection will provide information on the changing interrelationships among variables that cannot be obtained from any other source. The new data will permit much more subtle and comprehensive analysis that will help us to understand the sources of past geographic and economic mobility and to explain why mobility declined in the twentieth century. The database will enable us to begin to disentangle period and cohort changes in life-course processes, and will open exciting new opportunities for multilevel multivariate analyses in a key period of social and economic transition.

NAPP Articles in this Issue

The two articles that immediately follow offer a detailed explication of the techniques used for record linkage in NAPP. Ron Goeken, Lap Huynh, T. A. Lynch, and Rebecca Vick describe the process of creating the linked representative samples for the United States in the late nineteenth and early twentieth century. Rebecca Vick and Lap Huynh describe protocols for the name standardization for record linkage and quantifies the effects of standardization on linkage rates in the United States and Norway.

NAPP microdata offer unprecedented opportunities for spatial analysis. Sula Sarkar and Patricia Kelly Hall demonstrate the use of Geographic Information Systems (GIS) to explore the internal and international migration of adult male workers in nineteenth century Europe and North America, offering guidance to researchers who would like to incorporate NAPP maps into their analysis.

The final two articles discuss major new initiatives that are building on NAPP to make additional resources available to researchers. Gunnar Thorvaldsen describes an exciting new initiative, the Longitudinal Population Register for Norway. The Norwegian NAPP data will serve as the base for building a longitudinal population register that can be linked with the existing population register for the period from 1964 to the present. This ambitious project will eventually link individuals across all available church and vital statistics records, providing the longest-running longitudinal demographic database ever constructed, and opening extraordinary opportunities for studying the demographic transformation of Norway over the past two centuries. John Logan, Jason Jindrich, Hyoungjin Shin, and Weiwei Zhang describe the Urban Transition Historical GIS Project, a path-breaking initiative to map the enumeration districts of 39 major cities in 1880. Used in conjunction with the NAPP 1880 census microdata, the project will allow unprecedented comparative study of the structure of nineteenth-century cities, and will permit the development of new measures of residential and socioeconomic segregation.

As the articles in this issue suggest, old census data are not of purely historical interest; they are essential tools for basic social research and policy analysis. Models and descriptions based on historical experience underlie both theories of past change and projections into the future. The NAPP data provide a unique laboratory for the study of economic and demographic processes. This kind of empirical foundation is essential for testing social and economic theory. The massive structural shifts of the long nineteenth century still resonate today. Revealing the causal mechanisms underlying these shifts is crucial for understanding the forces that are now shaping twenty-first century society.

Footnotes

We can see the recent decline using survey data (e.g., Wolf and Longino 2005), but surveys cannot reveal the long-run trends.

References

Abe S. Support vector machines for pattern classification. London: Springer-Verlag; 2005. [Google Scholar]
Anderson M. Family structure in nineteenth century Lancashire. Cambridge: Cambridge University Press; 1972. [Google Scholar]
Astone NM, McLanahan SS. Family structure, residential mobility, and school dropout: A research note. Demography. 1994;31:575–584. [PubMed] [Google Scholar]
Cain M, McNicoll G. Population growth and agrarian outcomes. In: Lee RD, Arthur WB, Kelley AC, Rodgers G, Srinivasan TN, editors. Population, food and rural development. Oxford, England: Clarendon; 1988. pp. 101–117. [Google Scholar]
Centers for Disease Control and Prevention. Morbidity and Mortality Weekly Report. 2001;50(RR17):1–17. [Google Scholar]
Christiani N, Shawe-Taylor J. An introduction to support vector machines. Cambridge: Cambridge University Press; 2000. [Google Scholar]
Cooper JA. Interstate competition and state death taxes: A modern crisis in historical perspective. Pepperdine Law Review. 2006;3:835–81. [Google Scholar]
de Tocqueville A. In: Democracy in America. Reeve H, translator. Cambridge, MA: Sever and Francis; 1862. [Google Scholar]
Department of Health and Human Services. Essentials for attorneys in child support enforcement. Washington, DC: Administration for Children and Families, Office of Child Support Enforcement; 2002. [Google Scholar]
Ferrie J. The end of American exceptionalism? Mobility in the U.S. since 1850. Journal of Economic Perspectives. 2005;19:199–215. [Google Scholar]
Foster JO. Class struggle and the industrial revolution: Early industrial capitalism in three English towns. London: Weidenfeld and Nicolson; 1974. [Google Scholar]
Goeken R, Hall PK. Historical Methods. New findings on internal migration using linked records. Forthcoming. [Google Scholar]
Goeken R, Huynh L, Lynch TA, Vick R. New methods of census linking. Historical Methods. 2011;44:7–14. doi: 10.1080/01615440.2010.517152. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hajnal Two kinds of preindustrial household formation system. Population and Development Review. 1982;8:449–94. [Google Scholar]
Hall PK, Ruggles S. “Restless in the midst of their prosperity”: New evidence of the internal migration patterns of Americans, 1850–1990. Journal of American History. 2004;91:829–46. [Google Scholar]
Hareven T. The dynamics of kin in an industrial community. American Journal of Sociology. 1978;84:S151–S182. [Google Scholar]
Hartman MS. The household and the making of history: A subversive view of the Western past. Cambridge: Cambridge University Press; 2004. [Google Scholar]
Janssens A. Family and social change: The household as a process in an industrializing community. Cambridge: Cambridge University Press; 1993. [Google Scholar]
Katz MB. The people of Hamilton, Canada West: Family and class in a mid-nineteenth-century city. Cambridge, MA: Harvard University Press; 1975. [Google Scholar]
Katz MB, Doucet MJ, Stern MJ. The social organization of early industrial capitalism. Cambridge, MA: Harvard University Press; 1982. [Google Scholar]
Lait AJ, Randell B. Department Technical Report Series No. 550. Department of Computing Science, University of Newcastle upon Tyne; England: 1993. An assessment of name matching algorithms. http://homepages.cs.ncl.ac.uk/brian.randell/home.informal/Genealogy/NameMatching.pdf. [Google Scholar]
Laslett P. Introduction: The history of the family. In: Laslett P, Wall R, editors. Household and family in past time. Cambridge: Cambridge University Press; 1972. pp. 1–73. [Google Scholar]
Litwak E. Geographic mobility and extended family cohesion. American Sociological Review. 1960;25:385–94. [Google Scholar]
Long J. Working Paper #W11253. National Bureau of Economic Research; 2005. A tale of two labor markets: Intergenerational occupational mobility in Britain and the U.S. since 1850. http://www.nber.org/papers/w11253. [Google Scholar]
Long J, Ferrie JP. The path to convergence: Intergenerational occupational mobility in Britain and the U.S. in three eras. Economic Journal. 2007;117:C61–C71. http://www3.interscience.wiley.com/cgi-bin/fulltext/117984644/PDFSTART.
Macfarlane A. The origins of English individualism: The family, property and social transition. New York: Cambridge University Press; 1978. [Google Scholar]
Macfarlane A. Marriage and love in England: Modes of reproduction 1300–1800. Oxford: Blackwell; 1986. [Google Scholar]
Modell J. Patterns of consumption, acculturation, and family income strategies in late nineteenth-century America. In: Hareven TK, Vinovskis MA, editors. Family and Population in Nineteenth-Century America. Princeton, NJ: Princeton University Press; 1978. pp. 206–40. [Google Scholar]
Nicolopoulou-Stamati P. Effects of mobility on health. In: Nicolopoulou-Stamati P, Hens L, Howard CV, editors. Environmental health impacts of transport and mobility. Dordrecht, the Netherlands: Springer; 2005. pp. 1–7. [Google Scholar]
Pamarthy K. Master of Science Report. University of Minnesota; Minneapolis, MN: 2007. May, A machine learning framework for record linkage in census data. [Google Scholar]
Parsons T, Bales RF. Family, socialization and interaction process. Glencoe, IL: Free Press; 1955. [Google Scholar]
Philips L. The double-metaphone search algorithm. C/C++ User’s Journal. 2000;18:38–43. [Google Scholar]
Piekarski C. The effect of an increasingly mobile society on Kentucky’s grandparent visitation statute: The ability of courts to enforce their orders. Brandeis Law Journal. 2004;42:693–710. [Google Scholar]
Porter EH, Winkler WE. Census Bureau Research Report RR97/02. U.S. Bureau of the Census; Washington, DC: 1997. Approximate string comparison and its effect on an advanced record linkage system. http://www.fcsm.gov/working-papers/porter-winkler.pdf. [Google Scholar]
Reher DS. Family ties in Western Europe: Persistent contrasts. Population and Development Review. 1998;24:203–34. [Google Scholar]
Roberts E, Ruggles S, Dillon L, Gardarsdottir O, Oldervoll J, Thorvaldsen G, et al. The North Atlantic Population Project: An overview. Historical Methods. 2003;36:80–88. [Google Scholar]
Ruggles S. Prolonged connections: The rise of the extended family in nineteenth-century England and America. Madison: University of Wisconsin Press; 1987. [Google Scholar]
Ruggles S. Linking historical censuses: A new approach. History and Computing. 2006;14:213–24. http://www.hist.umn.edu/~ruggles/Articles/linking.pdf.
Ruggles S. Reconsidering the northwest European family system: Living arrangements of the aged in comparative historical perspective. Population and Development Review. 2009;35:249–73. doi: 10.1111/j.1728-4457.2009.00275.x. http://www3rd.interscience.wiley.com/cgi-bin/fulltext/122456480/PDFSTART. [DOI] [PMC free article] [PubMed]
Ruggles S. Stem families and joint families in comparative historical perspective. Population and Development Review. 2010;36:563–77. doi: 10.1111/j.1728-4457.2010.00346.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thernstrom S. Poverty and progress: Social mobility in a nineteenth century city. Cambridge, MA: Harvard University Press; 1964. [Google Scholar]
Thernstrom S. The other Bostonians: Poverty and progress in the American metropolis, 1880–1970. Cambridge, MA: Harvard University Press; 1973. [Google Scholar]
Thornton A. Reading history sideways: The fallacy and enduring impact of the developmental paradigm on family life. Chicago: University of Chicago Press; 2005. [Google Scholar]
Vapnik VN. Statistical learning theory. New York: Wiley Inter-science; 1998. [Google Scholar]
Wolf DA, Longino CF. Our “increasingly mobile society?” The curious persistence of a false belief. Gerontologist. 2005;45:5–11. doi: 10.1093/geront/45.1.5. [DOI] [PubMed] [Google Scholar]

[R1] Abe S. Support vector machines for pattern classification. London: Springer-Verlag; 2005. [Google Scholar]

[R2] Anderson M. Family structure in nineteenth century Lancashire. Cambridge: Cambridge University Press; 1972. [Google Scholar]

[R3] Astone NM, McLanahan SS. Family structure, residential mobility, and school dropout: A research note. Demography. 1994;31:575–584. [PubMed] [Google Scholar]

[R4] Cain M, McNicoll G. Population growth and agrarian outcomes. In: Lee RD, Arthur WB, Kelley AC, Rodgers G, Srinivasan TN, editors. Population, food and rural development. Oxford, England: Clarendon; 1988. pp. 101–117. [Google Scholar]

[R5] Centers for Disease Control and Prevention. Morbidity and Mortality Weekly Report. 2001;50(RR17):1–17. [Google Scholar]

[R6] Christiani N, Shawe-Taylor J. An introduction to support vector machines. Cambridge: Cambridge University Press; 2000. [Google Scholar]

[R7] Cooper JA. Interstate competition and state death taxes: A modern crisis in historical perspective. Pepperdine Law Review. 2006;3:835–81. [Google Scholar]

[R8] de Tocqueville A. In: Democracy in America. Reeve H, translator. Cambridge, MA: Sever and Francis; 1862. [Google Scholar]

[R9] Department of Health and Human Services. Essentials for attorneys in child support enforcement. Washington, DC: Administration for Children and Families, Office of Child Support Enforcement; 2002. [Google Scholar]

[R10] Ferrie J. The end of American exceptionalism? Mobility in the U.S. since 1850. Journal of Economic Perspectives. 2005;19:199–215. [Google Scholar]

[R11] Foster JO. Class struggle and the industrial revolution: Early industrial capitalism in three English towns. London: Weidenfeld and Nicolson; 1974. [Google Scholar]

[R12] Goeken R, Hall PK. Historical Methods. New findings on internal migration using linked records. Forthcoming. [Google Scholar]

[R13] Goeken R, Huynh L, Lynch TA, Vick R. New methods of census linking. Historical Methods. 2011;44:7–14. doi: 10.1080/01615440.2010.517152. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Hajnal Two kinds of preindustrial household formation system. Population and Development Review. 1982;8:449–94. [Google Scholar]

[R15] Hall PK, Ruggles S. “Restless in the midst of their prosperity”: New evidence of the internal migration patterns of Americans, 1850–1990. Journal of American History. 2004;91:829–46. [Google Scholar]

[R16] Hareven T. The dynamics of kin in an industrial community. American Journal of Sociology. 1978;84:S151–S182. [Google Scholar]

[R17] Hartman MS. The household and the making of history: A subversive view of the Western past. Cambridge: Cambridge University Press; 2004. [Google Scholar]

[R18] Janssens A. Family and social change: The household as a process in an industrializing community. Cambridge: Cambridge University Press; 1993. [Google Scholar]

[R19] Katz MB. The people of Hamilton, Canada West: Family and class in a mid-nineteenth-century city. Cambridge, MA: Harvard University Press; 1975. [Google Scholar]

[R20] Katz MB, Doucet MJ, Stern MJ. The social organization of early industrial capitalism. Cambridge, MA: Harvard University Press; 1982. [Google Scholar]

[R21] Lait AJ, Randell B. Department Technical Report Series No. 550. Department of Computing Science, University of Newcastle upon Tyne; England: 1993. An assessment of name matching algorithms. http://homepages.cs.ncl.ac.uk/brian.randell/home.informal/Genealogy/NameMatching.pdf. [Google Scholar]

[R22] Laslett P. Introduction: The history of the family. In: Laslett P, Wall R, editors. Household and family in past time. Cambridge: Cambridge University Press; 1972. pp. 1–73. [Google Scholar]

[R23] Litwak E. Geographic mobility and extended family cohesion. American Sociological Review. 1960;25:385–94. [Google Scholar]

[R24] Long J. Working Paper #W11253. National Bureau of Economic Research; 2005. A tale of two labor markets: Intergenerational occupational mobility in Britain and the U.S. since 1850. http://www.nber.org/papers/w11253. [Google Scholar]

[R25] Long J, Ferrie JP. The path to convergence: Intergenerational occupational mobility in Britain and the U.S. in three eras. Economic Journal. 2007;117:C61–C71. http://www3.interscience.wiley.com/cgi-bin/fulltext/117984644/PDFSTART.

[R26] Macfarlane A. The origins of English individualism: The family, property and social transition. New York: Cambridge University Press; 1978. [Google Scholar]

[R27] Macfarlane A. Marriage and love in England: Modes of reproduction 1300–1800. Oxford: Blackwell; 1986. [Google Scholar]

[R28] Modell J. Patterns of consumption, acculturation, and family income strategies in late nineteenth-century America. In: Hareven TK, Vinovskis MA, editors. Family and Population in Nineteenth-Century America. Princeton, NJ: Princeton University Press; 1978. pp. 206–40. [Google Scholar]

[R29] Nicolopoulou-Stamati P. Effects of mobility on health. In: Nicolopoulou-Stamati P, Hens L, Howard CV, editors. Environmental health impacts of transport and mobility. Dordrecht, the Netherlands: Springer; 2005. pp. 1–7. [Google Scholar]

[R30] Pamarthy K. Master of Science Report. University of Minnesota; Minneapolis, MN: 2007. May, A machine learning framework for record linkage in census data. [Google Scholar]

[R31] Parsons T, Bales RF. Family, socialization and interaction process. Glencoe, IL: Free Press; 1955. [Google Scholar]

[R32] Philips L. The double-metaphone search algorithm. C/C++ User’s Journal. 2000;18:38–43. [Google Scholar]

[R33] Piekarski C. The effect of an increasingly mobile society on Kentucky’s grandparent visitation statute: The ability of courts to enforce their orders. Brandeis Law Journal. 2004;42:693–710. [Google Scholar]

[R34] Porter EH, Winkler WE. Census Bureau Research Report RR97/02. U.S. Bureau of the Census; Washington, DC: 1997. Approximate string comparison and its effect on an advanced record linkage system. http://www.fcsm.gov/working-papers/porter-winkler.pdf. [Google Scholar]

[R35] Reher DS. Family ties in Western Europe: Persistent contrasts. Population and Development Review. 1998;24:203–34. [Google Scholar]

[R36] Roberts E, Ruggles S, Dillon L, Gardarsdottir O, Oldervoll J, Thorvaldsen G, et al. The North Atlantic Population Project: An overview. Historical Methods. 2003;36:80–88. [Google Scholar]

[R37] Ruggles S. Prolonged connections: The rise of the extended family in nineteenth-century England and America. Madison: University of Wisconsin Press; 1987. [Google Scholar]

[R38] Ruggles S. Linking historical censuses: A new approach. History and Computing. 2006;14:213–24. http://www.hist.umn.edu/~ruggles/Articles/linking.pdf.

[R39] Ruggles S. Reconsidering the northwest European family system: Living arrangements of the aged in comparative historical perspective. Population and Development Review. 2009;35:249–73. doi: 10.1111/j.1728-4457.2009.00275.x. http://www3rd.interscience.wiley.com/cgi-bin/fulltext/122456480/PDFSTART. [DOI] [PMC free article] [PubMed]

[R40] Ruggles S. Stem families and joint families in comparative historical perspective. Population and Development Review. 2010;36:563–77. doi: 10.1111/j.1728-4457.2010.00346.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Thernstrom S. Poverty and progress: Social mobility in a nineteenth century city. Cambridge, MA: Harvard University Press; 1964. [Google Scholar]

[R42] Thernstrom S. The other Bostonians: Poverty and progress in the American metropolis, 1880–1970. Cambridge, MA: Harvard University Press; 1973. [Google Scholar]

[R43] Thornton A. Reading history sideways: The fallacy and enduring impact of the developmental paradigm on family life. Chicago: University of Chicago Press; 2005. [Google Scholar]

[R44] Vapnik VN. Statistical learning theory. New York: Wiley Inter-science; 1998. [Google Scholar]

[R45] Wolf DA, Longino CF. Our “increasingly mobile society?” The curious persistence of a false belief. Gerontologist. 2005;45:5–11. doi: 10.1093/geront/45.1.5. [DOI] [PubMed] [Google Scholar]

PERMALINK

The North Atlantic Population Project

STEVEN RUGGLES

EVAN ROBERTS

SULA SARKAR

MATTHEW SOBEK

Abstract

TABLE 1.

Linked Data Sets

Spatiotemporal Analysis

NAPP Articles in this Issue

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

The North Atlantic Population Project

STEVEN RUGGLES

EVAN ROBERTS

SULA SARKAR

MATTHEW SOBEK

Abstract

TABLE 1.

Linked Data Sets

Spatiotemporal Analysis

NAPP Articles in this Issue

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases