Skip to main content
Nature Communications logoLink to Nature Communications
. 2021 Jan 12;12:323. doi: 10.1038/s41467-020-20544-y

Inferring high-resolution human mixing patterns for disease modeling

Dina Mistry 1,2, Maria Litvinova 2,3,4, Ana Pastore y Piontti 2, Matteo Chinazzi 2, Laura Fumanelli 5, Marcelo F C Gomes 6, Syed A Haque 2, Quan-Hui Liu 7, Kunpeng Mu 2, Xinyue Xiong 2, M Elizabeth Halloran 8,9, Ira M Longini Jr 10, Stefano Merler 5, Marco Ajelli 2,4,, Alessandro Vespignani 2,3,
PMCID: PMC7803761  PMID: 33436609

Abstract

Mathematical and computational modeling approaches are increasingly used as quantitative tools in the analysis and forecasting of infectious disease epidemics. The growing need for realism in addressing complex public health questions is, however, calling for accurate models of the human contact patterns that govern the disease transmission processes. Here we present a data-driven approach to generate effective population-level contact matrices by using highly detailed macro (census) and micro (survey) data on key socio-demographic features. We produce age-stratified contact matrices for 35 countries, including 277 sub-national administratvie regions of 8 of those countries, covering approximately 3.5 billion people and reflecting the high degree of cultural and societal diversity of the focus countries. We use the derived contact matrices to model the spread of airborne infectious diseases and show that sub-national heterogeneities in human mixing patterns have a marked impact on epidemic indicators such as the reproduction number and overall attack rate of epidemics of the same etiology. The contact patterns derived here are made publicly available as a modeling tool to study the impact of socio-economic differences and demographic heterogeneities across populations on the epidemiology of infectious diseases.

Subject terms: Computational models, Epidemiology


The growing need for realism in addressing complex public health questions calls for accurate models of the human contact patterns that govern disease transmission. Here, the authors generate effective population-level contact matrices by using highly detailed macro (census) and micro (survey) data on key socio-demographic features.

Introduction

Mathematical and computational models of infectious disease transmission are increasingly used to provide scenario analysis and forecasts during epidemic outbreaks and quantitative answers to complex public health questions such as devising the effectiveness of control strategies (vaccination, school closure, etc.) during health threat emergencies1. Modeling approaches have thus moved away from the classic homogeneous and stylized framework2,3, progressively incorporating heterogeneities that depend on between- and within-country population variability, disease timescale, transmission settings, as well as specific pathogen characteristics. For instance, geographically structured models allow evaluation of spatially heterogeneous interventions in both animal and human diseases4,5, while individual-based models lay down the possibility of simulating all micro-details of the transmission process and tracking in time and space each individual of the simulated population68. When data-driven, these approaches have highlighted the importance of the social, demographic, and economic characteristics of the population in determining the actual mesh of contacts underlying disease spreading among individuals. For this reason, a broad range of methodologies have been used to study human-mixing patterns, including surveys912, contact diaries1319, wearable sensors20,21, analysis of time-use data22, development of synthetic populations2325, and mixed approaches for instance integrating diary-based contact data with time-use data26,27 or combining contact data with modeling techniques26,28,29. However, each methodology has different limitations and assumptions because contact patterns among individuals vary according to the geographical scale (from census blocks to the national level), the disease under consideration, and the detailed socio-economic and demographic characteristics of the population.

Here, we present a data-driven approach to generate effective descriptions of complex contact patterns that can be used to inform infectious disease modeling approaches, including the widely adopted compartmental modeling framework. We make use of highly detailed macro (census) and micro (survey) data from publicly available sources on key socio-demographic features (e.g., age structure, household composition and members’ age gaps, employment rates, school structure) to construct synthetic populations of interacting agents, each one representing a hypothetical individual in the real population. The proposed method relies on both macro- and micro-level data for multiple socioeconomic characteristics and can be adapted to different geographical contexts and diseases; something that is not possible in a “one-model-fits-all” approach.

We provide synthetic contact matrices for nations around the world with substantially large and diverse populations. Specifically, we report contact patterns at the subnational level in the following countries: Australia, Canada, China, India, Israel, Japan, Russia, South Africa, and the United States of America. These populations account for 277 subnational administrative regions (such as states, provinces, prefectures, territories, etc. depending on the considered country), cover ~38% of the world’s surface area, and account for ~3.5 billion people of the world’s 7.6 billion population. The resulting synthetic populations are used to generate age-stratified contact matrices for the most common social settings, in which individuals spend their time interacting with each other (i.e., households, schools, workplaces, and the general community). The resulting contact matrices capture differences at the subnational level that reflect the high degree of cultural and societal diversity of the focus countries. This approach allows us to provide a mesoscopic description of the human contact patterns that can be used in the mathematical and computational analysis of infectious disease spread (see Fig. 1).

Fig. 1. Modeling framework.

Fig. 1

Schematic representation of the workflow for modeling human-mixing patterns and infection transmission dynamics.

To illustrate the importance of considering national and subnational heterogeneities in the analysis of infectious disease epidemiology, we construct the contact matrix relevant for airborne infectious diseases by calibrating the combination of setting-specific (household, workplace, school) contact matrices using as ground truth seven diary-based contact matrices (six European countries14 and Russia18). The resulting matrices are validated against out-of-sample contact data collected in France30, Japan31, and China32. These contact matrices are then used in the modeling of influenza transmission patterns at the national and subnational levels. The influenza modeling simulations, although considering identical disease etiology, highlights considerable heterogeneities in reproduction number and attack rates across regions of the world included in this study, reflecting differences in key demographic properties such as average age and student population.

As a service to the community, a database containing the inferred setting-specific matrices as well as the overall contact matrices for all locations (and countries) is available on the dedicated online repository: https://github.com/mobs-lab/mixing-patterns. Python codes to work with the contact matrices and examples of how to use them in age-structured compartmental models are available on the same website as well. This presented work can be easily generalized to other countries and settings, and arm the community with a general framework that can be used to make inference on important epidemiological parameters in the modeling of infectious diseases.

Results

We use a data-driven computational approach to infer the contact networks in the social settings where people interact and spend most of their time. In particular, we focus on four social settings (household, school, workplace, and the general community), which are particularly relevant for influenza transmission7,33. To reconstruct the synthetic population in each context we use a wide variety of national and subnational micro-level, census, and demographic data that provide the separate characteristics of the population, and the association of multiple characteristics. Micro-level data drawn from socio-demographic surveys are especially useful as no assumptions on the rules of disaggregation are required (the data are already on the required level of disaggregation).

Contacts between individuals in the real-world populations are inferred by analysis of the generated data-driven synthetic networks by measuring the frequency of links between individuals (living, going to school, or working together) in the synthetic contact networks of the different social settings. Then we compare summary statistics derived from the generated synthetic population for each geographical area to those reported in official (macro) statistics (e.g., census data). Examples of the summary statistics used in the approach are the age structure of the population, distributions of household size, type, number of children by household size, and so on, depending on the summary statistics available from official sources. The generated data is compared to the distributions of summary statistics by using the goodness-of-fit tests at the desired level of significance (generally 5%). We use a non-parametric bootstrap procedure to test the uncertainty level of our sampling. This procedure is iterated until a satisfactory fit is reached. In the case of inadequate microdata (e.g., sub-optimal sample size), we use the microdata to extrapolate rules on the age gaps between household members conditioned on the age of the household head, household size, and the relation between the members (e.g., age gap between spouses, age gap between siblings). Note that the same arguments are extended to other settings (e.g., schools, workplaces, hospitals) and can be extended to further stratifications relevant for other diseases (e.g., easy access to health care facilities). An illustration of the matrices construction workflow is reported in Fig. 1, while the full technical description is reported in “Methods” and Supplementary Information.

Setting-specific contact matrices

We report here the results for populations of 277 subnational administrative regions of Australia, Canada, China, India, Israel, Japan, Russia, South Africa, and the United States of America, characterizing contact patterns for about 3.5 billion individuals. We also include data at the national level for 26 European countries23. The inferred age-specific contact matrices reveal strong patterns, of which many are common to the diverse locations under study. Figure 2 shows the age-mixing patterns Fijk defined as the per capita frequency of contact of an individual of age i with an individual of age j in setting k.

Fig. 2. Age-mixing patterns by setting.

Fig. 2

Each heatmap represents the average frequency of contact between an individual of a given age (x axis) and all of their possible contacts (y axis). a Matrices of household contacts by age at the national level for China, the United States, and India. The six smaller panels in the center and on the left show household contact matrices at the subnational level in two provinces of China (Beijing, Guizhou), two locations of the United States (the state of New York and the District of Columbia), and two states of India (Maharashtra, Meghalaya). b Matrices of school contacts by age at the national level (from top to bottom: China, the United States, India). c Matrices of work contacts by age at the national level (from top to bottom: China, the United States, India).

Starting with the household setting in Fig. 2a, we observe that contacts between individuals can largely be characterized as that between couples living together, and parents and their children in the same household14,23. The increased frequency of contact between adults of similar ages along the main diagonal of the household contact matrix represents couples of similar ages living together, while the bands of high frequency above and below the main diagonal indicate contact between parents and children. While most locations share these overall features, the contact matrices show different age-mixing patterns. For instance, in China (Fig. 2a), the lower frequency of contact between children within households is the reflection of the country’s so-called “One-child policy”. The policy, enacted in 1979 up to 2016, has resulted in over a generation of many Chinese youths growing up without siblings, and hence having less contact on average with other children in this setting. This is a stark contrast with the United States and India (Fig. 2a), where the presence of multiple children born to a family results in an increased frequency of contact between this age group in the household matrix. The presence of multigenerational families in countries like India is also evident from the increased frequency of contact between all age groups, notably between the elderly (60 years and older) and young children. The same feature was observed in ref. 19 for Zimbabwe. Even within the same country, contact patterns may be markedly different. Figure 2a shows the age-mixing patterns within households for two different provinces of China: Beijing and Guizhou. While the household contact patterns in Beijing show a clear signal of the “One-child policy”, Guizhou shows the presence of multigenerational families, as well as an increased presence of multiple children living in the same household. This can be traced back to the fact that the Guizhou Province is characterized by a large frequency of minority groups and the “One-child policy” was less strictly applied for minorities.

Figure 2b, c shows the inferred contact matrices in the school and workplace settings for China, the United States, and India. In both settings, the age-mixing patterns vary strongly, reflecting differences in the educational systems, and economic conditions unique to each location. For all locations in our study, the school setting consistently exhibits the highest frequencies of contact between children and young adults attending school together. Interaction with older adults in this setting reflects the contact students have with instructors and other staff members in school. The variability of age-mixing patterns between children in India (Fig. 2b) also reflects the many different kinds of schools that children can attend throughout the country and the different age groups found in those schools. In the workplace environment, most interaction takes place between individuals in the range of 20–65 years of age, with the age range depending on local retirement, employments regulations, and culture. For instance, in many parts of the world it is common for teenagers to be fully or partially employed (see the work contact matrix for the US—Fig. 2c); in India, census records for employment list even children among the population of workers.

Statistical validation of the contact matrices against summary statistics of a large set of socio-demographic indicators has been performed to validate our results (see Supplementary Information).

Human-mixing patterns for influenza transmission

The contact matrices obtained in each setting acquire epidemiological relevance when combined together to generate the descriptions of human-mixing patterns relevant to the spreading of a specific disease. Here, we define the matrix of effective contacts relevant to influenza transmission based on the relative contribution of the household, school, and workplace. Here, by “effective”, it is indicated a contact that can lead to the disease transmission. In addition to these three social settings, we consider also the contribution of less structured casual encounters in the population34, by considering a community contact matrix that assumes individuals as potentially fully mixed23. To combine the different matrices, we propose a weighted linear combination of the derived matrices for the four considered social settings, and compute the overall matrix of contacts between individuals of age i and individuals of age j, M (whose elements are denoted as Mij), as a weighted linear combination of setting-specific contact matrices:

Mij=kωkFijk 1

where the element Mij represents the average number of contacts with individuals of age j for an individual of age i per day, and each ωk ≥ 0 is indicating the number of contacts in each setting k.

Generally, the ωk are unknown disease-specific weights accounting for the relative importance of the different social settings in the transmission of a specific infectious disease. In the case of airborne infectious diseases, we leverage on diary-based survey contact matrices reported in14,18 for Finland, Germany, Italy, Luxembourg, The Netherlands, the United Kingdom, and the Tomsk Oblast of Russia. For European countries, we relied on data and the setting-specific contact matrices developed in Fumanelli et al.23 that covers 26 countries. Unfortunately, Poland, and Belgium, which are included in the POLYMOD study14 used to calibrate the overall contact matrix are not included in ref. 23. We perform a multiple linear regression analysis to find the values of ωk such that the resulting Mij best fits the empirical data. Note that the empirical matrices derived in refs. 14,18 describe the average number of contacts of age j for an individual of age i, and in “Methods” we show how ωk is related to an average number of contacts 〈c〉 per individual. The regression yields 4.11 contacts (standard error, SE 0.41) in the household setting, 11.41 contacts (SE 0.27) in schools, 8.07 contacts (SE 0.52) in workplaces, and 2.79 contacts (SE 0.48) for the general community setting. It is worth remarking that the estimated weight for household contacts is larger than the average household size. This likely reflects the definition of contacts at home (rather than with household members) used in the POLYMOD study14 that has been used to calibrate the weights. The rationale for using the POLYMOD and the Russian studies14,18 in estimating the weights used to assemble the setting-specific synthetic matrices lies in the extensive validation of those contact patterns in epidemiological studies of a set of airborne infectious diseases, including influenza29,3539.

Our approach provides overall best matching ωk and that, in principle, some of the differences in the social behavior of specific countries may not be captured by this approach. For this reason, as a validation of this calibration method, in Fig. 3a we report the correlation between the resulting synthetic matrices for France, Japan, and the Shanghai Province of China and the available empirical matrices for these additional locations3032. We find significant (P value < 0.001) Pearson correlations of 0.92, 0.9, and 0.8 for France, Japan, and Shanghai Province, respectively. Moreover, we use the Canberra distance as a measure of the similarity between two contact matrices23 (see “Methods” for the definition of the Canberra distance). We estimate the distance between the seven survey-based matrices used in the calibration phase and their respective synthetic matrices to be 0.21 on average (range: 0.17–0.28). (Note that the resulting Canberra distance is normalized by the square of the number of elements of the contact matrix to account for the different number of age groups considered by the different diary-based contact surveys). When considering the three locations used as out-of-sample validation, we estimate a slightly larger average distance of 0.29 (range: 0.21–0.37), suggesting the adequacy of the employed methodology. Finally, Fig. 3b shows a visual comparison between the synthetic and survey matrices, which highlights that the synthetic contact matrices are able to capture the specific features of each location such as contact patterns at school and the relative intensity of the main diagonals.

Fig. 3. Comparison to out-of-sample survey matrices.

Fig. 3

a Density plots showing the correlation of survey-based contact matrices for three out-of-sample locations (France, Japan, and the Shanghai Province of China) and their respective synthetic contact matrices (all normalized to sum to one). The points represent the actual values of the survey and synthetic contact matrices. The linear correlation between the elements of each survey matrix and the corresponding elements of the synthetic matrix is reported in terms of the Pearson correlation coefficient, whose values are reported in each plot. b Heatmaps representing the normalized survey matrices and the normalized overall synthetic matrices for France, Japan, and the Shanghai Province of China.

Figure 4a shows the synthetic overall contact matrices for China, the United States, and India. The contact matrices for all locations share many similarities: bands of increased contact along the main and off diagonals reflect the familiar household contact patterns, increased contact between adults age 20 and ~65 years old account for the interactions between the population’s workforce, and the dominant contact patterns in the lower left of the contact matrices reflect the high number of interactions between school-aged individuals. Depending on the age structure of the population, the intensity of interactions occurring in the school setting can vary; however, this feature consistently dominates the contact matrix for all locations in our study.

Fig. 4. Overall contact matrices.

Fig. 4

Each heatmap represents the overall average number of contacts relevant for airborne infectious disease transmission by age at the national level for China, the United States, and India.

To quantify the similarity between the overall contact matrices in different locations, we use a hierarchical clustering algorithm based on the Canberra distance to identify clusters of locations (dis)similar to each other23. We find that locations tend to cluster together by country (Fig. 5a), indicating that overall the contact patterns within a single country are more similar to each other than to the patterns observed in other countries. Strikingly, though not surprisingly, locations within developed countries such as Australia, Canada, and the United States are similar to each other and are clustered together, while at the same time locations throughout India, South Africa, and the North Caucasus region of Russia also cluster together, indicating a similarity in patterns between locations in the developing and transition world. Interestingly, a few territories of Canada, Russia, and India are outliers, indicating that the contact patterns in these locations are different from what is observed in all other locations (including their respective countries). A more detailed discussion is reported in Supplementary Information. If we consider the US state of New York as a reference and compute the distance from all other locations to it, a geographical pattern clearly emerges (Fig. 5b). Indeed, the contact patterns in most states of the US, and the urbanized areas of Canada and Australia appear to be very closely related to the one inferred for New York. In contrast, most of India, South Africa, and of the territories in Canada, Russia, and Australia have contact patterns noticeably different from those obtained for the state of New York.

Fig. 5. Clustering of contact matrices.

Fig. 5

a Clustered matrix of the Canberra distance between subnational contact matrices and associated dendrogram using hierarchical clustering to organize subnational locations. Lighter colors indicate locations more similar to each other (distance closer to 0). b World map of the subnational level where colors represent the Canberra distance between each subnational location and the US state of New York (used as a reference point). The gray color means that no data is available. Note that the country of Israel is treated at the national level, rather than the subnational level, due to both its relatively small population and area, and the resolution of data available for reconstruction.

Epidemiological relevance

To investigate the effect of the computed contact matrices on infection transmission dynamics, we develop an age-structured SIR model to describe influenza transmission dynamics in the sites considered. The SIR model describes the spread of influenza in terms of the transition of individuals between different epidemiological compartments. Susceptible individuals (i.e., those at risk of acquiring the infection—S) can become infectious (i.e., capable to transmit the infection—I) after coming into contact with infectious individuals. Subsequently, infectious individuals recover from the infection and become removed (R) after a certain amount of time (the infectious period). In an age-structured implementation of the model, individuals are now identified also by their age, and the contact matrix is introduced to describe the number of contacts between susceptible individuals of age i and all of their possible infectious contacts of age j2,13 (see “Methods” for details). More specifically, we considered a transmission model with identical disease parameters across geographical locations considered in the study. The contact matrices are thus the only factor driving the difference in dynamics and attack rate (total number of infected individuals) of the simulated epidemic.

Compared to the case of homogeneous mixing, where all individuals are assumed to be in contact with each other in equal proportions, the inclusion of the contact matrices in the epidemic model consistently yields a lower overall attack rate for all locations (Fig. 6a). This difference is also reflected in the strong variability of the basic reproduction number R0, representing the number of cases generated by a typical index case in a fully susceptible population, which depends on the spectral radius of the matrix M as well as population structure (see Supplementary Information). To provide further validation of the adequacy of the matrices in characterizing the specific dynamics of influenza transmission in the Supplementary Information, we report the simulations of the age-structured SIR model calibrated on real data from the H1N1 influenza pandemic in multiple locations. The model adequately reproduces the age-specific seroprevalence profiles in Israel, Italy, Japan, UK, and USA4044.

Fig. 6. Epidemic impact.

Fig. 6

a Scatter plot of the attack rate and the reproduction number R0 from an age-structured SIR model using the contact matrix for each subnational location. European countries are included. The black line shows the results of the classic homogeneous mixing SIR model (no age groups). b Scatter plot of attack rates and the average age in each location. The black line represents the best-fitting linear model demonstrating a negative linear correlation between attack rates and the average age of the population. c Scatter plot of attack rates and percentage of the population attending educational institutions in each location. The black line represents the best-fitting linear model.

To understand the underlying factors of the observed heterogeneities across geographical locations, we use a linear regression model to compare the attack rates and various socio-demographic features of each location (see Supplementary Information). We identified two socio-demographic features that correlate strongly with the attack rate: the average age of the population (Fig. 6b) and the fraction of the population in the educational system including instructors (Fig. 6c). Indeed, if we examine the attack rates by age and setting (see Supplementary Information), we observe that the greatest proportion of infections occur as a result of contact due to the school setting, and that attack rates, in general, are highest for school-aged individuals. Going further, an inspection of the incidence profile by age (see Supplementary Information) also clearly shows that individuals with high contact frequencies with others in the school setting are infected earlier in higher proportions. These results mirror well-known influenza spreading trends/patterns observed in the real world14,23. The observed results are robust (although with quantitative differences) to changes in transmissibility patterns and susceptibility to infection by age (see Supplementary Information). Taken together, our results suggest that developing countries with younger populations, and thus more school-aged individuals, are likely to experience higher overall attack rates when compared to older, developed countries.

We can also investigate how the attack rate and R0 for each location would differ if we only had knowledge of the contact patterns at the national level. In this scenario, we use the country-level influenza transmission contact matrices in each location (note that each location is still characterized by its own specific age structure) and compare the results with those obtained by using the location-specific contact matrices everything else kept identical for the disease transmission model (Fig. 7a–c). By using the country-level matrix, we observe a much lower variability than by using location-specific mixing patterns. Moreover, location-specific attack rates and R0 show a nonlinear relation with the results obtained using country-level contact patterns. Interestingly, we can observe clear geographical trends in the percent difference in attack rate using location-specific contact patterns in comparison to the corresponding country-level ones. For instance in much of the western area of China where most of the nation’s ethnic minorities live, using the average matrix would lead to underestimating the final impact of an epidemic, while we would overestimate it in the more traditionally urbanized/industrialized areas in the north-east of the country, such as Beijing and Shanghai (Fig. 7a).

Fig. 7. Subnational heterogeneity.

Fig. 7

a The black dots represent the estimated attack rates in each province of China by using the country-level contact matrix and the location-specific age structure of the population. Colored dots represent the estimated attack rates in each location by using both the location-specific contact matrix and the age structure of the population. The colored lines connect the two estimated values of attack rate for each location. The transmission rate is set such that R0 = 1.5 when using the country-level matrix. Each map shows the percentage variation of the attack rate using the location-specific contact matrix with respect to using the national contact matrix as a proxy for the subnational contact patterns (i.e., (ARc − ARl)/ARc, where ARc is the attack rate estimated by using the country-level contact matrix, and ARl is that estimated by using the location-specific matrix). Colors toward Astra in the color scale indicate an overestimation of the attack rate in the location when using the country-level contact matrix as a proxy for the subnational contact patterns. Conversely, colors towards grape in the color scale indicate an underestimation of the attack rate in the location when using the country-level matrix as a proxy for the subnational contact patterns. b Same as a, but for the USA. c Same as a, but for India.

Discussion

We have presented a general framework for the synthetic generation of age-stratified mixing patterns in key social settings (the household, school, workplace) for the transmission of airborne infectious diseases. The contact patterns we derived are not directly measured via survey or other direct methods (e.g., wearable sensors). Rather, we infer these age-based relationships between individuals by measuring them in synthetic populations developed using a novel approach that combines macro- and microdata available from public sources. While this is a limitation as, in general, a direct measure is preferred with respect to a derived one, this approach allows us to: (i) be flexible in the definition of effective contacts and thus to adapt our methodology to the study of different infectious diseases which require alternative definitions of “effective contact for transmission”; and (ii) focus on broad arrays of countries for which a direct measure is not available, especially at the subnational scale.

The use of age-mixing patterns in age-structured epidemic models provides insight into the epidemiology and dynamics of infectious diseases both within and between different countries around the world, as we have shown for the case of influenza. Our approach allows the integration of contact patterns that vary according to the geographical scale, the disease under consideration, and the detailed socioeconomic and demographic characteristics of the population. The developed method can be adapted to different geographic scales, conditional on the presence of sufficient data on age-specific intra-household, school, and work interdependencies. However, it is important to remark that, even if data availability allows the development of micro-level (e.g., zip-code, census block) synthetic populations, focusing on the geographic units smaller than commuting distances would break down the representativeness of age-mixing patterns to model the spread of an epidemic.

The use of data-driven heterogeneous mixing patterns, especially at the subnational level, opens up the door to potential applications in the more realistic modeling of the worldwide circulation of pathogens with epidemic/pandemic potential. The developed contact matrices also allow the study of the impact on the epidemiology of infectious diseases of socioeconomic disparities and demographic peculiarities (e.g., one-child policy). Eventually, by making all of the derived mixing patterns (in the form of readily usable contact matrices by age) publicly available, the presented results may benefit the research community actively working on the development of infectious disease forecasting approaches and mathematical models in support of the public health decision-making processes.

Methods

Development of the synthetic populations

To construct synthetic populations in different countries, we made use of a wide array of data sources (see Supplementary Information). These data provide distributions of key socio-demographic characteristics, such as the age structure, household size, age of the head of the household, age gaps between household members, household composition, employment rates, the educational system, and enrollment rates, etc. Distributions such as these are typically available either as macro-level data from census databases and other governmental sources, or as micro-level data coming from surveys conducted on a sample of the population. Census databases routinely provide information at a broader scope such as the age structure of a population, or the fertility rates; however, they often lack more detailed information related to the household composition and age relationships between household members. For this, we rely on micro-level surveys which collect the data at the household and individual level and ask participants for information in regards to their health, household condition and composition, economic conditions, and more. The kind of data available also varies by country and even at the subnational level, thus necessitating the development of adaptive algorithms that can take in the available data and accommodate for variability in data organization to produce a faithful reconstruction of each population. With this in mind, the procedure implemented can be summarized as follows.

The first step in the reconstruction of a real-world population is the generation of households. In this process, we use two types of multinomial sampling. The first is based on the probability distribution M(y) of an independent socio-demographic characteristic y. For instance, such characteristic y can be the household size or composition, depending on the data available. The second type of multinomial sampling is based on the probability M(xy1=i1,y2=i2,...,yn=in) of characteristic x conditional on the value i of a previously determined variable(s) y. In this case, x and y are assumed (when supported by available data) to have bivariate or multivariate joint distributions. Typically, the larger the number of joint distributions incorporated, the more precise the reconstruction of the real-world population. The precision of such a reconstruction is, however, often limited by the scope of the data (such as the survey sample size for each characteristic y) and its availability. For example, of the multinomial joint distributions used here, one is the distribution of the age of the head of the household by the size of the household and the household composition (whether a couple, a single parent with children, siblings, multigenerational families, etc.). The bivariate joint distributions incorporated is considerably long and includes (but is not limited to) distributions of the age of household members by the age of the head of the household, the age gap between couples living together by the age of one in the pair, the mother’s age at childbirth by the age of the child, the number of household members by their relation to the age of the household head (such as a spouse, parent, child, grandchild, sibling, in-law, etc.) by the age of the household head and the household composition. These joint distributions were either found in the macro data or estimated from the micro survey data. Characteristics of the resulting synthetic households are compared to the distributions of the summary statistics available from the macro-level data using a goodness-of-fit test at the desired level of significance (generally 5%).

A similar procedure is used to assign those individuals to their respective schools and workplaces based on enrollment and employment records. These records detail the enrollment and employment rates by age, institutional sizes, and their age structures, as well as the student-to-teacher ratios in the case of schools. A more detailed explanation of the construction of the synthetic population can be found in  Supplementary Information together with the results of the comparison between the synthetic and actual population statistics.

Construction of age-based contact matrices

We use synthetic contact networks to infer average age-based contact patterns within each social setting. For each location, these age-based contact patterns are encoded in a contact matrix Fk, whose elements Fijk describes the average frequency of contact between a given individual of age i and individuals of age j in setting k. We focus on 4 social settings: the households (H), schools (S), workplaces (W), and the general community (C). Specifically, here we adopt the frequency-dependent (mass action) transmission model, with the implicit assumption that an increased population density has no effect on the per capita contact rate between individuals45. This choice of modeling mechanism was already proved to represent a good approximation for the description of the transmission patterns of several infectious diseases2. Moreover, it allows us to readily compare epidemiological parameters between social settings and locations with disparate population density, and thus makes for an appropriate framework when modeling the transmission dynamics of heterogeneous populations around the world. The calculation of the contact matrices can be described as follows.

First, we compute the relative abundance of contacts between individuals of age i and individuals of age j in each configuration s of the setting k, Γijk(s).

Γijk(s)=ϕik(s)(ϕjk(s)δij)νk(s)1, 2

where ϕik(s) is the number of individuals of age i in the configuration s (i.e., a specific household, school, or workplace) of setting k; δij is the Kronecker delta function, which we use to omit the individual i from their own set of contacts; νk(s) is the number of individuals (of all ages) in instance s of setting k. Note that to compute Γijk(s), we assume homogeneous mixing within each configuration of the setting, i.e., each individual can be in contact with other individuals, and as a result the matrix Γijk(s) has the expected symmetric property Γijk(s)=Γjik(s).

Second, we compute the per capita probability of contact of an individual of age i with an individual of age j in setting k as Fijk.

Fijk={s:νk(s)>1}Γijk(s)/Ni, 3

where Ni is the total number of individuals of age i. Note that matrix Fk (i.e., the matrix of elements Fijk) is not symmetric.

Third, we combine the setting-specific contact matrices by age Fk to derive a matrix of the overall contacts by age M. We propose a weighted linear combination of the derived matrices in the four focus settings, calibrated to match the empirically estimated contact matrices from two contact diary survey studies in seven locations throughout Western Europe and Russia14,18. We perform a multiple linear regression to calibrate the weights of the synthetic setting contact matrices such that their linear combination matches the overall contact matrix for all seven locations coming from the survey studies (see Supplementary Information for details and for a comparison between the empirical and synthetic contact matrices).

Following this approach, we are also able to evaluate the uncertainty of point estimates of the contact matrices. While the absolute level of uncertainty results to be negligible if compared to the differences between the age groups, if synthetic individuals are sampled from the synthetic population, the level of introduced uncertainty becomes comparable with the one for diary-based contact studies (see Supplementary Information).

The average number of contacts

The average number of contacts 〈c〉 can be computed as

c=1NiNijMij 4

where N = ∑iNi is the total number of individuals in the population.

Therefore,

c=1NiNijkωkFijk=1NiNijkωksΓijk(s)/Ni=1NkωkijsNiΓijk(s)/Ni=1NkωkijsΓijk(s)=kωksijΓijk(s)/N=kωkZk/N, 5

where Zk is the number of individuals having at least one contact in setting k. Note that in the calculation, we used the symmetric property of matrix Γijk(s). This expression provides a relation between the parameters ωk and the overall per capita contact of relevance in epidemiological studies.

Canberra distance

To make side-by-side comparisons of the inferred contact matrices by age, we use the Canberra distance23. Specifically, each matrix is treated as a vector on which the Canberra distance is defined as

d(x,y)Canberra=ixiyixi+yiforxi,yi01forxi,yi=0 6

This yields a distance value of 0 for two locations with identical contact matrices, and increasingly larger distance values for two locations with increasingly different contact matrices.

Age-structured disease transmission model

For each location l, the transmission dynamics of influenza are modeled through an age-structured SIR model, where the mixing patterns are defined by the contact matrix previously introduced, Mij.

The model is defined by the following set of equations:

Si˙=λiSiIi˙=λiSiγIiRi˙=γIi, 7

where Si is the number of susceptible individuals of age i, Ii is the number of infected individuals of age i, Ri is the number of recovered or removed individuals of age i; γ−1 is the infectious periods (which corresponds to the generation time in the simple SIR model46,47), which is set to 2.6 days48; and λi represents the force of infection to which an individual of age i is exposed to other infected individuals and expressed as

λi=βjMijIjNj, 8

where β is the transmissibility of the infection, Ni is the total number of individuals of age i, and Mij measures the average number of contacts for an individual of age i with all of their contacts of age j.

The basic reproduction number R0, representing the number of cases generated by a typical index case in a fully susceptible population, can be defined for this model as

R0=βγρ(M), 9

where ρ(M) is the dominant eigenvalue of the matrix M49.

To build the synthetics populations, we use publicly available databases listed in Table 1.

Table 1.

Data sources for each country and the country code.

Country Country code Sources
Australia AUS Australian Bureau of Statistics51
Canada CAN Statistics Canada52
BC Stats53
Finding Quality Childcare: A guide for parents in Canada54
China CHN China Health and Nutrition Survey55
China Census 201056
China Statistical Yearbook57
India IND The 15th Indian Census58
Demographic and Health Surveys (2005)59
Unified District Information System for Education60,61
All India Survey on Higher Education62
Israel ISR Israel Census 200863
Japan JPN Official Statistics of Japan64
Russia RUS Russia Longitudinal Monitoring Survey65
2010 All-Russian Population Census66
Federal State Statistics Service67
South Africa ZAF Statistics South Africa68
Statistics on Post-School Education and Training in South Africa69
World Health Survey (2003)70
South African Revenue Service71
United States of America USA Decennial Census of Population and Housing72
Current Population Survey73
American Community Survey74
IPUMS USA75

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Reporting Summary (69.9KB, pdf)

Acknowledgements

A.P.y.P., M.C., K.M., X.X., M.E.H., I.M.L., and A.V. acknowledge funding from Models of Infectious Disease Agent Study, National Institute of General Medical Sciences Grant U54GM111274 and the CDC contract 75D30119C05765. Q.-H.L. has received funding from the National Natural Science Foundation of China (No. 62003230), the Fundamental Research Funds for the Central Universities (No. 1082204112289). The authors would like to thank Nicole Samay for her assistance in preparing the figures.

Author contributions

M.A. and A.V. designed the experiment. D.M., M.L., A.P.P., M.C., L.F., M.F.C.G., S.A.H., Q.-H.L., K.M., and X.X. collected and integrated the data. D.M., M.L., and M.A. carried out the analysis. D.M., M.L., A.P.P., M.C., L.F., M.F.C.G., S.A.H., Q.-H.L., K.M., X.X., M.E.H., I.M.L., S.M., M.A., and A.V. contributed to the writing and discussion of the paper.

Data availability

A database containing the inferred setting-specific matrices as well as the contact matrices for influenza transmission for all locations (and countries) is publicly available on the dedicated online repository: https://github.com/mobs-lab/mixing-patterns50. Python and R routines to work with the contact matrices and examples of how to use them in age-structured compartmental models are also available.

Code availability

The code can be publicly accessed at the dedicated online repository https://github.com/mobs-lab/mixing-patterns50.

Competing interests

A.P.y.P., M.C., and A.V. report grants from Metabiota Inc and M.A. reports research funding from Seqirus, outside the submitted work. The remaining authors declare no competing interests.

Footnotes

Peer review information Nature Communications thanks Piero Manfredi and Lander Willem for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Marco Ajelli, Email: marco.ajelli@gmail.com.

Alessandro Vespignani, Email: a.vespignani@northeastern.edu.

Supplementary information

Supplementary information is available for this paper at 10.1038/s41467-020-20544-y.

References

  • 1.Van Kerkhove MD, Ferguson NM. Epidemic and intervention modelling: a scientific rationale for policy decisions? Lessons from the 2009 influenza pandemic. Bull World Health Organ. 2012;90:306–310. doi: 10.2471/BLT.11.097949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Anderson RM, May RM. Infectious Diseases of Humans: Dynamics and Control. Oxford, UK: Oxford University Press; 1991. [Google Scholar]
  • 3.Metcalf CJE, Edmunds WJ, Lessler J. Six challenges in modelling for public health policy. Epidemics. 2015;10:93–96. doi: 10.1016/j.epidem.2014.08.008. [DOI] [PubMed] [Google Scholar]
  • 4.Keeling M, Woolhouse M, May R, Davies G, Grenfell BT. Modelling vaccination strategies against foot-and-mouth disease. Nature. 2003;421:136–142. doi: 10.1038/nature01343. [DOI] [PubMed] [Google Scholar]
  • 5.Colizza V, Barrat A, Barthelemy M, Valleron A-J, Vespignani A. Modeling the worldwide spread of pandemic influenza: baseline case and containment interventions. PLoS Med. 2007;4:e13. doi: 10.1371/journal.pmed.0040013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Longini IM, et al. Containing pandemic influenza at the source. Science. 2005;309:1083–1087. doi: 10.1126/science.1115717. [DOI] [PubMed] [Google Scholar]
  • 7.Ferguson NM, et al. Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature. 2005;437:209–214. doi: 10.1038/nature04017. [DOI] [PubMed] [Google Scholar]
  • 8.Merler S, Ajelli M. The role of population heterogeneity and human mobility in the spread of pandemic influenza. Proc. R Soc. B. 2010;277:557–565. doi: 10.1098/rspb.2009.1605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Fenton KA, et al. Sexual behaviour in Britain: reported sexually transmitted infections and prevalent genital Chlamydia trachomatis infection. The Lancet. 2001;358:1851–1854. doi: 10.1016/S0140-6736(01)06886-6. [DOI] [PubMed] [Google Scholar]
  • 10.Dodd PJ, et al. Age-and sex-specific social contact patterns and incidence of Mycobacterium tuberculosis infection. Am. J. Epidemiol. 2016;183:156–166. doi: 10.1093/aje/kwv160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhang J, et al. Changes in contact patterns shape the dynamics of the COVID-19 outbreak in China. Science. 2020;368:1481–1486. doi: 10.1126/science.abb8001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jarvis CI, et al. Quantifying the impact of physical distance measures on the transmission of COVID-19 in the UK. BMC Med. 2020;18:124. doi: 10.1186/s12916-020-01597-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wallinga J, Teunis P, Kretzschmar M. Using data on social contacts to estimate age-specific transmission parameters for respiratory-spread infectious agents. Am. J. Epidemiol. 2006;164:936–944. doi: 10.1093/aje/kwj317. [DOI] [PubMed] [Google Scholar]
  • 14.Mossong J, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med. 2008;5:e74. doi: 10.1371/journal.pmed.0050074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hens N, et al. Mining social mixing patterns for infectious disease models based on a two-day population survey in Belgium. BMC Infect. Dis. 2009;9:5. doi: 10.1186/1471-2334-9-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Horby P, et al. Social contact patterns in Vietnam and implications for the control of infectious diseases. PLoS ONE. 2011;6:e16965. doi: 10.1371/journal.pone.0016965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Read JM, et al. Social mixing patterns in rural and urban areas of southern China. Proc. R Soc. B. 2014;281:20140268. doi: 10.1098/rspb.2014.0268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ajelli M, Litvinova M. Estimating contact patterns relevant to the spread of infectious diseases in Russia. J. Theor. Biol. 2017;419:1–7. doi: 10.1016/j.jtbi.2017.01.041. [DOI] [PubMed] [Google Scholar]
  • 19.Melegaro A, et al. Social Contact Structures and Time Use Patterns in the Manicaland Province of Zimbabwe. PLoS ONE. 2017;12:e0170459. doi: 10.1371/journal.pone.0170459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cattuto C, et al. Dynamics of person-to-person interactions from distributed RFID sensor networks. PLoS ONE. 2010;5:e11596. doi: 10.1371/journal.pone.0011596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kiti MC, et al. Quantifying social contacts in a household setting of rural kenya using wearable proximity sensors. EPJ Data Sci. 2016;5:21. doi: 10.1140/epjds/s13688-016-0084-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zagheni E, et al. Using time-use data to parameterize models for the spread of close-contact infectious diseases. Am. J. Epidemiol. 2008;168:1082–1090. doi: 10.1093/aje/kwn220. [DOI] [PubMed] [Google Scholar]
  • 23.Fumanelli L, Ajelli M, Manfredi P, Vespignani A, Merler S. Inferring the structure of social contacts from demographic data in the analysis of infectious diseases spread. PLoS Comput. Biol. 2012;8:e1002673. doi: 10.1371/journal.pcbi.1002673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Grefenstette JJ, et al. FRED (A Framework for Reconstructing Epidemic Dynamics): an open-source software system for modeling infectious diseases and control strategies using census-based populations. BMC Public Health. 2013;13:940. doi: 10.1186/1471-2458-13-940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gallagher S, Richardson LF, Ventura SL, Eddy WF. SPEW: synthetic populations and ecosystems of the world. J. Comput. Graph Stat. 2018;27:773–784. doi: 10.1080/10618600.2018.1442342. [DOI] [Google Scholar]
  • 26.Iozzi F, et al. Little Italy: an agent-based approach to the estimation of contact patterns-fitting predicted matrices to serological data. PLoS Comput. Biol. 2010;6:e1001021. doi: 10.1371/journal.pcbi.1001021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.De Cao E, Zagheni E, Manfredi P, Melegaro A. The relative importance of frequency of contacts and duration of exposure for the spread of directly transmitted infections. Biostatistics. 2014;15:470–483. doi: 10.1093/biostatistics/kxu008. [DOI] [PubMed] [Google Scholar]
  • 28.Prem K, Cook AR, Jit M. Projecting social contact matrices in 152 countries using contact surveys and demographic data. PLoS Comput. Biol. 2017;13:e1005697. doi: 10.1371/journal.pcbi.1005697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Litvinova M, Liu Q-H, Kulikov ES, Ajelli M. Reactive school closure weakens the network of social interactions and reduces the spread of influenza. Proc. Natl Acad. Sci. USA. 2019;116:13174–13181. doi: 10.1073/pnas.1821298116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Béraud G, et al. The French connection: the first large population-based contact survey in France relevant for the spread of infectious diseases. PLoS ONE. 2015;10:e0133203. doi: 10.1371/journal.pone.0133203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Munasinghe L, Asai Y, Nishiura H. Quantifying heterogeneous contact patterns in Japan: a social contact survey. Theor. Biol. Med. Model. 2019;16:6. doi: 10.1186/s12976-019-0102-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhang J, et al. Patterns of human social contact and contact with animals in Shanghai, China. Sci. Rep. 2019;9:1–11. doi: 10.1038/s41598-018-37186-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ajelli M, Poletti P, Melegaro A, Merler S. The role of different social contexts in shaping influenza transmission during the 2009 pandemic. Sci. Rep. 2014;4:7218. doi: 10.1038/srep07218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ajelli M, Merler S. The impact of the unstructured contacts component in influenza pandemic modeling. PLoS ONE. 2008;3:e1519. doi: 10.1371/journal.pone.0001519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kretzschmar M, Teunis PF, Pebody RG. Incidence and reproduction numbers of pertussis: estimates from serological and social contact data in five European countries. PLoS Med. 2010;7:e1000291. doi: 10.1371/journal.pmed.1000291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kucharski AJ, Gog JR. The role of social contacts and original antigenic sin in shaping the age pattern of immunity to seasonal influenza. PLoS Comput. Biol. 2012;8:e1002741. doi: 10.1371/journal.pcbi.1002741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Poletti P, et al. Perspectives on the impact of varicella immunization on herpes zoster. A model-based evaluation from three European countries. PLoS ONE. 2013;8:e60732. doi: 10.1371/journal.pone.0060732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Merler S, Ajelli M. Deciphering the relative weights of demographic transition and vaccination in the decrease of measles incidence in Italy. P. Roy. Soc. B. 2014;281:20132676. doi: 10.1098/rspb.2013.2676. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kretzschmar ME, et al. Impact of delays on effectiveness of contact tracing strategies for COVID-19: a modelling study. The Lancet Public Health. 2020;5:e452–e459. doi: 10.1016/S2468-2667(20)30157-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Weil M, et al. The dynamics of infection and the persistence of immunity to A (H1N1) pdm09 virus in Israel. Influenza Other Respir Viruses. 2013;7:838–846. doi: 10.1111/irv.12071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Merler S, et al. Pandemic influenza A/H1N1pdm in Italy: age, risk and population susceptibility. PLoS ONE. 2013;8:e74785. doi: 10.1371/journal.pone.0074785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Japanese Infectious Disease Surveillance Center. Influenza antibody holding status survey in FY 2010 - First Report (2010). https://www.niid.go.jp/niid/en/idsc-e.html (2018).
  • 43.Hardelid P, et al. Assessment of baseline age-specific antibody prevalence and incidence of infection to novel influenza A/H1N1 2009. Health Technol Assess. 2010;14:115–92. doi: 10.3310/hta14550-03. [DOI] [PubMed] [Google Scholar]
  • 44.Reed C, Katz JM, Hancock K, Balish A, Fry AM. Prevalence of seropositivity to pandemic influenza A/H1N1 virus in the United States following the 2009 pandemic. PLoS ONE. 2012;7:e48187. doi: 10.1371/journal.pone.0048187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Keeling, M. J. & Rohani, P. Modeling Infectious Diseases in Humans and Animals (Princeton University Press, 2011).
  • 46.Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc. R Soc. B. 2006;274:599–604. doi: 10.1098/rspb.2006.3754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Liu Q-H, et al. Measurability of the epidemic reproduction number in data-driven contact networks. Proc. Natl Acad. Sci. USA. 2018;115:12680–12685. doi: 10.1073/pnas.1811115115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Vink MA, Bootsma MCJ, Wallinga J. Serial intervals of respiratory infectious diseases: a systematic review and analysis. Am. J. Epidemiol. 2014;180:865–875. doi: 10.1093/aje/kwu209. [DOI] [PubMed] [Google Scholar]
  • 49.Diekmann O, Heesterbeek JAP, Metz JA. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. J. Math Biol. 1990;28:365–382. doi: 10.1007/BF00178324. [DOI] [PubMed] [Google Scholar]
  • 50.Mistry, D. et al. Inferring high-resolution human mixing patterns for disease modeling. 10.5281/zenodo.4287574 (2020). [DOI] [PMC free article] [PubMed]
  • 51.Australian Bureau of Statistics. Australian Bureau of Statistics. http://www.abs.gov.au/ (2011).
  • 52.Statistics Canada. Statistics Canada. https://www.statcan.gc.ca/eng/start (2011).
  • 53.Government of British Columbia. BC Stats. https://www2.gov.bc.ca/ (2011).
  • 54.Childcare Resource and Research Unit, Canadian Union of Postal Workers. Finding Quality Child Care: A guide for parents in Canada. https://findingqualitychildcare.ca/ (2014).
  • 55.National Institute for Nutrition and Health, China Center for Disease Control and Prevention, Carolina Population Center, University of North Carolina at Chapel Hill. China Health and Nutrition Survey (CHNS). https://www.cpc.unc.edu/projects/china/data/datasets (2009).
  • 56.China Statistics Press. Census 2010. http://www.stats.gov.cn/tjsj/pcsj/rkpc/6rp/indexch.htm (2010).
  • 57.China Statistics Press. China Statistical Yearbook 2010. http://www.stats.gov.cn/tjsj/ndsj/2010/indexeh.htm (2010).
  • 58.Office of the Registrar General and Census Commissioner, Ministry of Home Affairs, Government of India. 2011 Census Data. http://www.censusindia.gov.in/pca/Searchdata.aspx (2011).
  • 59.Demographic and Health Surveys (2005–2006). India: Standard DHS 2005-06. https://dhsprogram.com/data/dataset/India_Standard-DHS_2006.cfm?flag=0 (2016).
  • 60.Unified District Information System for Education (UDISE), National Institute of Educational Planning and Adminstration (2011–2012) Elementary Education in India. http://udise.in/src.htm (2016).
  • 61.Unified District Information System for Education (UDISE), National Institute of Educational Planning and Adminstration (2012–2013) Secondary Education in India: State Report Cards 2012–13. http://udise.in/src.htm (2016).
  • 62.Department of Higher Education, Ministry of Human Resource Development, Government of India. All India Survey on Higher Education. http://mhrd.gov.in/sites/upload_files/mhrd/files/statistics/AISHE2011-12P_1.pdf (2013).
  • 63.Israel Central Bureau of Statistics. Israel Census 2008. http://www.cbs.gov.il/census/census/pnimi_page_e.html?id_topic=2 (2008).
  • 64.Japanese Government Statistics. e-Stat, Portal Site of Official Statistics of Japan. https://www.e-stat.go.jp/en/ (2010).
  • 65.Popkin, B. M. et al. The Russia Longitudinal Monitoring Survey (RLMS). https://www.cpc.unc.edu/projects/china/data/datasets (2010).
  • 66.Federal State Statistics Service. 2010 All-Russian Population Census. http://www.gks.ru/free_doc/new_site/perepis2010/croc/perepis_itogi1612.htm (2010).
  • 67.Federal State Statistics Service. Labor market, employment and wages. http://www.gks.ru/wps/wcm/connect/rosstat_main/rosstat/ru/statistics/wages/labour_force/# (2010).
  • 68.Statistics South Africa. Census 2011. http://www.statssa.gov.za/ (2011).
  • 69.Department: Higher Education and Training, Republic of South Africa. Statistics on Post-School Education and Training in South Africa: 2011. http://www.cbs.gov.il/census/census/pnimi_page_e.html?id_topic=2 (2008).
  • 70.World Health Organization (WHO) (2003-2005) World Health Survey. http://apps.who.int/healthinfo/systems/surveydata/index.php/catalog (2017).
  • 71.South African Revenue Service, National Treasury. 2013 Tax Statistics. http://www.sars.gov.za/About/SATaxSystem/Pages/Tax-Statistics.aspx (2013).
  • 72.United States Census Bureau. Decennial Census of Population and Housing. https://www.census.gov/programs-surveys/decennial-census/decade.2010.html (2010).
  • 73.United States Census Bureau. Current Population Survey. https://www.census.gov/programs-surveys/cps/data-detail.html (2010).
  • 74.United States Census Bureau. American Community Survey. https://www.census.gov/programs-surveys/acs/data.html (2010).
  • 75.Ruggles, Steven and Flood, Sarah and Goeken, Ronald and Grover, Josiah and Meyer, Erin and Pacas, Jose and Sobek, Matthew. IPUMS USA: Version 8.0 [dataset]. 10.18128/D010.V8.0 (2010).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reporting Summary (69.9KB, pdf)

Data Availability Statement

A database containing the inferred setting-specific matrices as well as the contact matrices for influenza transmission for all locations (and countries) is publicly available on the dedicated online repository: https://github.com/mobs-lab/mixing-patterns50. Python and R routines to work with the contact matrices and examples of how to use them in age-structured compartmental models are also available.

The code can be publicly accessed at the dedicated online repository https://github.com/mobs-lab/mixing-patterns50.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES