Abstract
Hypertension is a highly prevalent cardiovascular disease. It marks a considerable cost factor to many national health systems. Despite its prevalence, regional disease distributions are often unknown and must be estimated from survey data. However, health surveys frequently lack in regional observations due to limited resources. Obtained prevalence estimates suffer from unacceptably large sampling variances and are not reliable. Small area estimation solves this problem by linking auxiliary data from multiple regions in suitable regression models. Typically, either unit- or area-level observations are considered for this purpose. But with respect to hypertension, both levels should be used. Hypertension has characteristic comorbidities and is strongly related to lifestyle features, which are unit-level information. It is also correlated with socioeconomic indicators that are usually measured on the area-level. But the level combination is challenging as it requires multi-level model parameter estimation from small samples. We use a multi-level small area model with level-specific penalization to overcome this issue. Model parameter estimation is performed via stochastic coordinate gradient descent. A jackknife estimator of the mean squared error is presented. The methodology is applied to combine health survey data and administrative records to estimate regional hypertension prevalence in Germany.
Keywords: Mixed-effect model, multi-source estimation, penalized maximum likelihood
1. Introduction
Hypertension is the most common cardiovascular disease worldwide. It is estimated that approximately 1.13 billion people aged 18 and older suffer from raised blood pressure globally [30]. Thus, it represents a major cost factor in the majority of national health systems that needs to be monitored closely. However, despite its high prevalence, the distribution of hypertension on regional levels is often unknown. Corresponding figures are rarely recorded in registries and must be estimated from suitable survey data instead. In order to obtain accurate estimates on regional levels, corresponding survey samples must be exhaustive and need to contain considerable geographic detail. But as a matter of fact, national health surveys often contain only a few observations per region (area) due to significant sampling efforts and limited resources. Direct estimators that only consider the observations within a given area then fail to provide prevalence estimates with sufficient accuracy as a result of large sampling variances.
Small area estimation (SAE) solves this problem by combining data from multiple areas in regression models. Estimation efficiency over a direct estimator is improved by exploiting the functional relation between an area statistic of interest and suitable auxiliary data. Area statistic estimates are obtained as predictions from the model [21]. Thereby, the estimates depend on the aggregation level of the auxiliary data. The most prominent SAE techniques are the area-level estimator by [11] and the unit-level estimator by [1]. Both approaches consider auxiliary data on a single aggregation level. But with respect to our application, it makes sense to consider both levels simultaneously. On the one hand, hypertension has characteristic comorbidities, such as type 2 diabetes [25], and is closely related to lifestyle variables, like alcohol consumption [20]. These features are typically unit-level information that are recorded in health surveys. On the other hand, hypertension is associated with socioeconomic indicators [26]. This information is often only available on the area-level, for example in terms of administrative records from official statistics. The joint usage of unit- and area-level data for SAE is not as well-established in the literature.
Some SAE methods use unit-level data while accounting for heterogeneity in fixed effects on the area-level, as for example proposed by [29]. This marks an important generalization of the nested error structure in the original unit-level approach by [1], which models differences between areas only via random intercept. Yet, it does not allow for the direct combination of unit- and area-level data sets as area-level heterogeneity is assumed to be due to random deviations from unit-level fixed effects. Twigg et al. [38] presented a multi-level approach to include both individual and ecological components to predict small area health-related behavior as a binary response. Still, the approach relies on a sequential procedure to calibrate a model on one data set first, and then use it in conjunction with another, which requires model specification to be very simple. Ghosh and Steorts [15] developed a two-stage benchmarking approach that combines unit- and area-level in a weighted loss function while benchmarking weighted means at both levels.
Using unit- and area-level data jointly encloses some methodological problems. Firstly, it requires model parameter estimation on both levels. In the presence of small samples, the increased number of parameters may lead to considerably high model parameter estimate variances due to the lack in degrees of freedom. In this case, model-based small area estimates also suffer from high variance. Secondly, unit- and area-level data have different distributional characteristics and correlation structures due to different degrees of aggregation [7]. As a result, the levels should not be treated equally in terms variable selection or model parameter estimation. Thirdly, unit- and area-level data is usually subject to different kinds of measurement errors. As ignoring measurement errors leads to suboptimal area statistic estimates, the researcher should account for this [27]. And finally, unit- and area-level data usually differs in terms of availability. Unit-level data is often rare due to privacy issues, whereas area-level data is less sensitive and easier to access, for example, from registries. Accordingly, an approach must deal with situations where there are a lot of variables on one level while there are only few on the other.
In the light of regional hypertension prevalence estimation, we develop a multi-level small area model with level-specific penalization to combine unit- and area-level data. Level-specific penalization refers to penalized maximum likelihood estimation of the model parameters where the fixed effect sets on each level are penalized individually. For this purpose, the elastic net [39] with level-specific weights on the included -norm and squared -norm is considered. Using level-specific penalization solves the methodological problems mentioned before. Firstly, it allows for high-dimensional inference. Hence, even if the number of model parameters surpasses the number of observations, the underlying optimization problem for model parameter estimation is still well-posed. This is particularly attractive in the presence of small samples. Secondly, level-specific penalization marks an intuitive way to treat unit- and area-level data differently for model parameter estimation. The penalties can be defined dependent on the distributional characteristics of the corresponding auxiliary data. Due to the sparsity-inducing effect of the -norm, an automatic level-specific variable selection is conducted. Thirdly, penalized model parameter estimation implies a robustification against measurement errors in the auxiliary data, as shown by [2] as well as [4]. Accordingly, level-specific penalization allows for different measurement errors on each level. And finally, the amount of shrinkage on each level can be altered depending on the number of variables available for prediction.
Penalized maximum likelihood estimation of the model parameters is performed via stochastic coordinate gradient descent (SCGD) using insights from [34,37]. Random effect prediction is done using a maximum a posteriori estimation, as shown by [31,34]. Estimation of the mean squared prediction error is performed via a modified jackknife approach drawing from [4,22]. The methodology is applied to estimate federal state-level hypertension prevalence in Germany. We combine unit-level data from the German health survey Gesundheit in Deutschland Aktuell (GEDA) [33] with area-level data from microcensus records [18] to estimate regional hypertension prevalence. The remainder of the paper is organized as follows. In Section 2, the methodology is explained. Section 3 contains a small simulation study and the application to regional health measurement. Section 4 closes with an outlook on future research. Please note that the presented article contains insights from a related working paper by [5].
2. Methodology
2.1. Penalized multi-level model
The subsequent description is slightly more generic to allow for more generality with respect to potential other applications. Let be a finite population of individuals indexed by . Assume that the population is segmented into m areas of size such that with pairwise disjoint for and . Let be a random sample of individuals. Assume that sampling procedure is such that where and . Let be a vector containing observations of the response variable Y from which the area statistic of interest in area i, let us say the mean , is calculated. Let be the fixed effect design matrix in area i containing unit-level auxiliary data for the description of . For notational convenience, denote as the fixed effect design matrix resulting from an expansion of the vector containing area-level auxiliary data. Note that is allowed. Let be the random effect design matrix in area i with . In the majority of SAE models, the random effect structure is limited to an area-specific random intercept. However, as they are typically special cases of linear mixed models [31], we allow for a more general area-specific random effect structure. The multi-level model combining unit- and area-level data is given by
| (1) |
where , are the fixed effect coefficient vectors for each level and denotes the random effect coefficient vector under multivariate normality with some general positive-definite covariance matrix . is a vector of independent and identically distributed random errors with model variance parameter . Note that , are assumed to be stochastically independent. The response vector is multivariate normal under the model
| (2) |
with , where the random effect covariance matrix is parameterized by a vector , for example, resulting from a Cholesky decomposition. We assume that . Restating the model over all areas obtains
| (3) |
with , , as stacked matrices and , , as stacked vectors. Define as the full parameter vector. The negative log-likelihood function of (3) is given by
| (4) |
with and denoting the determinant of . In a standard maximum likelihood framework, model parameter estimation is performed according to . However, in the light of the methodological issues mentioned in Section 1, we expand the negative log-likelihood by level-specific penalties. For this, the elastic net penalty introduced by [39] is used. It can be viewed as a compromise between the LASSO [36] and ridge regression [19]. The LASSO component contains the -norm of the fixed effect coefficients. It conducts an automatic variable selection by favoring sparse solutions to a given estimation problem. The ridge component contains the squared -norm. It stabilizes model parameter estimates in the presence of strong correlation within the covariates by favoring smooth solutions. The elastic net is a linear convex combination of the two components incorporating both of these properties. The optimization problem for model parameter estimation is stated as
| (5) |
where are predefined penalty parameters that regulate the effect of the penalty on the solutions for each level. They are typically determined by some form of cross validation [34]. In addition to that
| (6) |
is the elastic net penalty with as predefined hyper parameter controlling the contribution of the -norm and -norm to the overall penalization on a level. In the light of our application, using the elastic net with level-specific penalty and hyper parameters is attractive. It allows to adjust the penalization to the data situation on each level individually. Recall that the response variable has been observed on the unit-level. With respect to the unit-level covariates, the health survey data may contain insights on comorbidities of hypertension, but clearly also a variety of records regarding unrelated health issues. Thus, it is useful to choose and put a higher weight on the LASSO component to induce more sparsity. Regarding the area-level covariates, the socioeconomic indicators are likely to have stronger covariance structures as they are typically calculated from the same set of variables. Hence, it makes sense to choose and put a higher weight on the ridge component to obtain a smooth solution for the corresponding parameter estimates.
2.2. Model parameter estimation
A SCGD algorithm is used for model parameter estimation. We draw from [34,37] and modify their (block) coordinate gradient descent (CGD) method by a randomized cycling order. This improves the convergence probability in the light of the non-convexity of (5), as an unfortunate series of coordinates less likely to occur [3]. CGD implies that the value of the objective function is minimized gradually by updating a single element of the target parameter vector at a time while keeping the others fixed. Thereafter, the remaining elements are updated accordingly such that a cyclic movement through all coordinates of is achieved. This approach is particularly useful for the proposed multi-level model, as it allows for easy implementation of level-specific penalization in the estimation process.
Due to the unknown variance parameters , in the negative log-likelihood (4), the minimization problem (5) is non-convex. This complicates model parameter estimation significantly, as the algorithm is not guaranteed to achieve the global minimum. The non-convexity of the objective function favors the existence of local minima, which implies that the resulting model parameter estimates may be sensitive to starting values. However, the minimization with respect to is convex under fixed variance parameters. Following [34], this can be exploited in the estimation process. For , let be the index cycling through the coordinates in the tth iteration of the algorithm. Note that the order of the coordinates changes randomly after each iteration. Let denote the sth element of , where is the full parameter vector in the tth iteration. In general, an update is given by
| (7) |
where is the descent direction and is the step size. A common choice would be and . However, if is subject to elastic net penalization, the first and second partial derivatives do not exist. In that case, and have to be determined differently. Let be an approximation of . Following [37], we set , where is the main diagonal element of the Fisher information matrix corresponding to . If is not truncated, we can built upon an analytic update proposed by [34] for the LASSO that exploits being quadratic with respect to the fixed effect coefficients. For the elastic net penalty, we induce additional shrinkage to the LASSO update through dividing by , as suggested by [12]. If is truncated, is obtained according to [37] and is determined via the Armijo Rule. We further use an active-set strategy [12,34]. This implies that is only updated when . Let be the prediction of excluding the contribution of . The complete SGCD procedure is stated hereafter.
2.3. Random effect prediction and area statistic estimates
From Section 2.1, we can conclude that the conditional distribution of given is
| (8) |
In order to predict the random effect realizations, the conditional distribution of the random effects given the response variable realizations must be quantified. The mode of this distribution is the best predictor (BP) for . This is often referred to as maximum a posteriori estimation. Using the Bayes theorem, we obtain [34]
| (9) |
with φ as normal probability density. The minimization problem in (9) has a closed-form solution under the model assumptions, which is given by
| (10) |
Since in practice the model parameters are unknown, we use the empirical BP
| (11) |
with the estimates obtained from the minimization of . For the estimation of the area statistic , predictions from the multi-level model have to be generated. The BP of under the model is given by
| (12) |
However, (12) demands that the unit-level vector to be observed for all , which can be unrealistic in practice. An alternative arises from exploiting the SAE setting by means of the small sample approximation proposed by [1]. We substitute the population-based area statistic by a model-based quantity
| (13) |
where is the area-specific mean of . Using the model parameter estimation obtained before, the empirical BP is then given by
| (14) |
2.4. Mean squared error estimation
Based on the small sample approximation (13), we now address mean squared error (MSE) estimation. Generally, the MSE of the empirical BP is
| (15) |
The estimation of (15) is a difficult task due to the nonlinearity of the predictor. In SAE, depending on the model, second-order approximations to analytical solutions for the MSE may be available. Corresponding contributions were provided by for example [9,23,32], as well as [8]. However, these approximations are not applicable for the penalized multi-level model for several reasons. On the one hand, they do not consider penalization and the resulting shrinkage of the regression coefficients. On the other hand, they are not suitable for high-dimensional settings and they don't consider multi-level data. As far as we know, the derivation of an analytical MSE estimator in this context is still subject to ongoing research. A common alternative to analytical MSE approximations is resampling methods. Famous techniques are the bootstrap or the jackknife [10,22]. Corresponding contributions to SAE were provided by, for example, [17,24], as well as [6]. These methods seek to approximate the distribution of via resampling and have also been applied in SAE models with penalization. On that note, Burgard et al. [4] presented a modified jackknife approach for MSE estimation in a penalized area-level model. The basic idea is to first derive the conditional MSE of the best predictor under known model parameters given the design matrix. Afterwards, a delete-1-jackknife procedure is applied to account for the additional uncertainty resulting from penalized model parameter estimation. In the following, we extend this approach in order to apply it to the multi-level model in Section 2.1. The conditional MSE can be characterized according to [28]
| (16) |
Naturally, the representation of the conditional MSE is determined by the random effect structure. For simplicity, we assume that the latter is limited to a random intercept. Hence is a random scalar with and is a vector of 1s. See [32] for representations with other random effect structures. In our setting, we obtain
| (17) |
Define and . Let
| (18) |
Further, recall that and , where and are estimated from the sample observations of all m areas. Let and denote estimates of and that have been calculated from all areas except area i. The jackknife is sketched in Algorithm 2. After completing the algorithm, the jackknife estimator for the unconditional MSE of the empirical BP is estimated by
| (19) |
3. Simulation and application
The subsequent section is divided into two parts. In the first part, we provide a small simulation study to demonstrate the effectiveness of the penalized multi-level model for SAE. We further test the presented jackknife approach for MSE estimation. In the second part, the multi-level small area model from Section 2.1 is used to provide point estimates of the regional hypertension prevalence in Germany.
3.1. Simulation study
3.1.1. Set up
A Monte Carlo simulation with R = 500 iterations ( ) is conducted. We create a synthetic population of individuals in m = 100 areas of size . The population is generated once and held fixed for the subsequent simulation. In each iteration, a stratified random sample of size n = 300 with strata sample size is drawn from the synthetic population. Here, each stratum corresponds to one area of the synthetic population, which implies a sampling fraction of 1 per stratum. For unit-level auxiliary data, a total of 40 variables with a weak internal correlation structure is drawn from a multivariate normal with area-specific means. For area-level auxiliary data, 100 variables with a strong internal correlation structure is drawn from a multivariate normal. The response variable is created on the unit-level according to
| (20) |
where is a random area intercept and is a unit-level error term. Note that from the unit-level and area-level auxiliary data sets described above, only 2 variables per set is relevant for the functional description of Y. This is done in order to include variable selection aspects in the simulation study. The area statistic of interest is the area-specific mean of the response variable: . In order to estimate , the following predictors are considered:
LMM.Oracle: EBP under the true multi-level model (18) with known covariates for all units of the population.
FH.Oracle: Fay-Herriot EBP considering the true auxiliary variables, but using the true area-specific means of as substitute for unit-level data.
LMM.Select: EBP under a multi-level model where variable selection is performed via the corrected conditional AIC proposed by [16].
Multi.EN: Prediction from the penalized multi-level model.
The EBP under the original unit-level model proposed by [1] is not included since it exclusively considers unit-level variables. It cannot provide any reasonable results given the way the response variable is generated. However, the EBP under the Fay–Herriot model is included, as the unit-level variables can be aggregated in order to use them on the area-level. Point estimation performance is evaluated via the relative root mean squared error (relative RRMSE) over all areas and Monte Carlo iterations. We further look at the relative bias and the coefficient of variation. The performance of the MSE estimation is measured via relative RRMSE and relative bias.
3.1.2. Results
We start with the point estimation. Table 1 shows the performance of the considered predictors. It can be seen that the penalized multi-level model obtains efficient results compared the standard SAE approaches. Its relative RRMSE is the lowest among all predictors except for the LMM.Oracle. The latter is the most efficient predictor, which was expected since it serves as a reference within the simulation. It has perfect information by knowing the true model and the covariate values for all individuals in each area. LMM.Select is slightly less efficient than FH.Oracle despite using information from both the unit- and the area-level. This is due to additional uncertainty resulting from multi-level covariate selection with a single-level information criterion, as LMM.Select does not know the true model. An interesting aspect is that Multi.EN has the highest relative bias of all predictors. This is due to the fact that penalized maximum likelihood introduces bias to model parameter estimation. On the other hand, it stabilizes model predictions, as can be seen by the coefficient of variation. The penalized multi-level model has the lowest variation among all predictors, including LMM.Oracle. This makes it ultimately more efficient than the standard predictors.
Table 1. Point estimation results.
| Predictor | Relative Bias | Coeff.Variation | Relative RMSE |
|---|---|---|---|
| LMM.Oracle | 0.00063 | 0.01977 | 0.01667 |
| FH.Oracle | 0.00094 | 0.02606 | 0.02313 |
| LMM.Select | 0.00026 | 0.02938 | 0.02368 |
| Multi.EN | 0.00128 | 0.01951 | 0.01960 |
We continue with the MSE estimation results obtained from the jackknife procedure in Section 2.4. They are visualized in Figure 1. The density of the relative MSE estimate deviations over all areas and Monte Carlo iterations are depicted. They are given by . We see that the jackknife obtains reasonable estimates on average. The mean of the distribution is close to zero. However, it has a slight tendency to overestimation. The overall relative bias is 0.040, whereas the relative RRMSE is 0.217. Further, some right-skewness is clearly evident. This is mainly due to estimation outliers. On the one hand, the methodology is sensitive to certain data constellations as a result of the non-convexity of the underlying optimization problem for model parameter estimation. And on the other hand, the -norm induces different degrees of sparsity in the resampling process, which results in different sets of active predictors. Since in the jackknife procedure the squared deviations to the original predictions are added up, this can lead to large MSE estimates.
Figure 1.
MSE estimation results.
3.2. Application
The objective is to estimate the hypertension prevalence for the population of age 18+ on the federal state level. The definition of the disease profile is adapted from [33]. We combine two different data sources for this purpose. The first data source is the German health survey Gesundheit in Deutschland Aktuell (GEDA) from 2010. It is a national health survey of roughly 20,000 participants of age 18+ that are interviewed via CATI within a national representative telephone sample. The survey contains medical and lifestyle-related information that are used as unit-level data source within our study. For further information on the survey as well as its respective sampling design and response rates, we refer to [33], pp. 173. The second data source is administrative records on the federal state level that are obtained from the German microcensus 2010. The microcensus is a large-scale survey that covers 1 -sample of the German population with a single-stage stratified cluster sampling design. The data is collected via CAPI. The microcensus contains (among others) sociodemographic and economic information that we use to maximize the explanatory power for hypertension prevalence estimation. For deeper insights into the survey as well as its sampling design and data collection procedures, see [13].
Regarding the level-specific penalization, the hyper parameters are used. This choice is in accordance with Section 2.1, where we addressed the effects of the elastic net's components. From GEDA, we have an extended amount of variables that cover different aspects of health. In order to identify the variables that are relevant for hypertension, we choose closer to put emphasis on variable selection. From the microcensus, we have socioeconomic indicators that have strong internal correlation. Therefore, we choose close to zero in order to obtain smooth coefficient estimates for them. The tuning parameters are determined by k-fold cross-validation with a bivariate grid search. Hereafter, we provide a brief overview of selected covariates for hypertension prevalence estimation. For further insights on the statistical properties of variable selection via penalized maximum likelihood, see [14]. We also increase the determined level-specific tuning parameter values gradually in order to assess the relevance of the selected fixed effect for the regional hypertension prevalence within the underlying regression model. With this, we obtain a rough measure of significance for the selected covariates. We distinguish three levels of significance: strong (***), medium (**), weak (*). From GEDA, exemplary unit-level variables are
Demography: sex***, age group***.
Comorbidity: having other cardiovascular diseases***.
Lifestyle: smoking or drinking***, sport activities***.
Medical care: visits to the doctor**, health insurance membership*.
Living conditions: degree of urbanisation**.
From the microcensus, variables were selected on the area-level. Examples are
Socioeconomy: income distribution***, education structure***.
Labour market: share of industrial sectors**, unemployment**.
Population structure: foreign nationalities*.
Using the mentioned variables for prevalence estimation obtains the following results.
Figure 2 is a heat map of Germany in which the estimated hypertension prevalence per federal state are displayed. The nationwide hypertension prevalence is at 26.8 . This is consistent with the results of the [33], which calculated a survey-based 95 -confidence interval of [25.9 ; 27.6 ]. By looking at the federal state estimates, one can see that the lowest prevalence is located in the south of the country, which consists of the federal states Baden–Württemberg and Bavaria. The highest prevalence can be found in the east of the country, which is the former territory of the German Democratic Republic. The estimated distribution is plausible, as in past studies similar distributions of related diseases like type 2 diabetes mellitus have been found [35].
Figure 2.
Estimated hypertension prevalence.
4. Conclusion and discussion
A penalized multi-level model for regional hypertension prevalence estimation was presented. It allows for the efficient combination of unit- and area-level data via level-specific elastic net penalties. With this feature, it contributes to multi-level modelling in the context of SAE. This is particularly attractive for official statistics and especially public health reporting, where more and more data sources are considered due to phenomena like the digitization. However, there is still need for further methodological research. While the model-based properties of penalized maximum likelihood have been recently established, for instance by [14], its design-based properties are still unclear. In particular, the consideration of survey weights and corresponding design-consistency in this context have not been sufficiently addressed yet. As long as these issues remain, researchers have to carefully check whether obtained estimates are sensitive with respect to survey weights.
Further, MSE estimation for penalized multi-level models is still subject to ongoing research. The presented jackknife procedure allows for decent MSE estimates on average. However, some results are unstable as the procedure is sensitive to outliers. As an analytical approach to MSE estimation in this setting does not seem to be in grasp, further research must be done on adequate resampling methods that take into account the effects of penalization on the estimation procedure. A more practical question for future research is how to properly determine the sets of covariates in a given context. Depending on the field of application, there may be data regarding a specific covariate on both the unit- and area-level. Due to the automatic variable selection caused by the sparsity inducing penalizations, it cannot be predicted in advance whether the methodology will consider both or only one of the levels in that case. In the light of known phenomena like ecological fallacy, this is an interesting topic for future studies.
Acknowledgements
This research was conducted within the research project Research Innovation for Official and Survey Statistics (RIFOSS) that is funded by the German Federal Statistical Office. We kindly thank for the financial support. We further thank two anonymous referees for their constructive comments that helped to improve the quality of the paper.
Funding Statement
This work was supported by RIFOSS.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Battese G.E., Harter R.M., and Fuller W.A., An error-components model for prediction of county crop areas using survey and satellite data, J. Am. Stat. Assoc. 83 (1988), pp. 28–36. doi: 10.1080/01621459.1988.10478561 [DOI] [Google Scholar]
- 2.Bertsimas D. and Copenhaver M.S., Characterization of the equivalence of robustification and regularization in linear and matrix regression, Eur. J. Oper. Res. 270 (2018), pp. 931–942. doi: 10.1016/j.ejor.2017.03.051 [DOI] [Google Scholar]
- 3.Bottou L., Curtis F.E., and Nocedal J., Optimization methods for large-scale machine learning, SIAM Rev. 60 (2018), pp. 223–311. Available at https://arxiv.org/abs/1606.04838v3. doi: 10.1137/16M1080173 [DOI] [Google Scholar]
- 4.Burgard J.P., Krause J., and Kreber D., Regularized area-level modelling for robust small area estimation in the presence of unknown covariate measurement errors, Research Papers in Economics 04/19 (2019). Trier University.
- 5.Burgard J., Krause J., and Münnich R., Combining unit- and area-level data for small area estimation via penalized multi-level models, Research Papers in Economics 5/19 (2019). Trier University.
- 6.Chen S. and Lahiri P., On mean squared prediction error estimation in small area estimation problems, Commun. Stat. Theory Methods 37 (2008), pp. 1792–1798. doi: 10.1080/03610920701826427 [DOI] [Google Scholar]
- 7.Clark W.A.V. and Avery K.L., The effects of data aggregation in statistical analysis, Geogr. Anal. 8 (1976), pp. 428–438. doi: 10.1111/j.1538-4632.1976.tb00549.x [DOI] [Google Scholar]
- 8.Das K., Jiang J., and Rao J.N.K., Mean squared error of empirical predictor, Ann. Stat. 32 (2004), pp. 818–840. doi: 10.1214/009053604000000201 [DOI] [Google Scholar]
- 9.Datta G.S. and Lahiri P., A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems, Stat. Sin. 10 (2000), pp. 139–152. [Google Scholar]
- 10.Efron B., Nonparametric estimation of standard error: The jackknife, the bootstrap and other methods, Biometrika 68 (1981), pp. 589–599. doi: 10.1093/biomet/68.3.589 [DOI] [Google Scholar]
- 11.Fay R.E. and Herriot R.A., Estimates of income for small places: An application of James–Stein procedures to census data, J. Am. Stat. Assoc. 74 (1979), pp. 269–277. doi: 10.1080/01621459.1979.10482505 [DOI] [Google Scholar]
- 12.Friedman J., Hastie T., and Tibshirani R., Regularization paths for generalized linear models via coordinate gradient descent, J. Stat. Softw. 33 (2010), pp. 1–22. doi: 10.18637/jss.v033.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.GESIS Leibniz Institute for the Social Sciences , Metadata for official statistics (2020). Available at https://www.gesis.org/en/missy/metadata/MZ/.
- 14.Ghosh A. and Thoresen M., Non-concave penalization in linear mixed-effects models and regularized selection of fixed effects, AStA Adv. Stat. Anal. 102 (2018), pp. 179–210. doi: 10.1007/s10182-017-0298-z [DOI] [Google Scholar]
- 15.Ghosh M. and Steorts R.C., Two-stage benchmarking as applied to small area estimation, Test 22 (2013), pp. 670–687. doi: 10.1007/s11749-013-0338-2 [DOI] [Google Scholar]
- 16.Greven S. and Kneib T., On the behaviour of marginal and conditional AIC in linear mixed models, Biometrika 97 (2010), pp. 773–789. doi: 10.1093/biomet/asq042 [DOI] [Google Scholar]
- 17.Hall P. and Maiti T., Nonparametric estimation of mean-squared prediction error in nested-error regression models, Ann. Stat. 34 (2006), pp. 1733–1750. doi: 10.1214/009053606000000579 [DOI] [Google Scholar]
- 18.Herwig A. and Schimpl-Neimanns B., Mikrozensus scientific use file 2010: Dokumentation und datenaufbereitung, Tech. Rep., GESIS – Leibnizinstitut für Sozialwissenschaften, 2013. GESIS-Technical Reports 2013/10.
- 19.Hoerl A.E. and Kennard R.W., Ridge regression: Biased estimation for nonorthogonal problems, Techometrics 12 (1970), pp. 55–67. doi: 10.1080/00401706.1970.10488634 [DOI] [Google Scholar]
- 20.Husain K., Ansari R., and Ferder L., Alcohol-induced hypertension: Mechanism and prevention, World. J. Cardiol. 6 (2014), pp. 242–252. doi: 10.4330/wjc.v6.i5.245 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Jiang J., Empirical best prediction for small-area inference based on generalized linear mixed models, J. Stat. Plan. Inference 111 (2003), pp. 117–127. doi: 10.1016/S0378-3758(02)00293-8 [DOI] [Google Scholar]
- 22.Jiang J., Lahiri P., and Wan S.M., A unified jackknife theory for empirical best prediction with m-estimation, Ann. Stat. 30 (2002), pp. 1782–1810. doi: 10.1214/aos/1043351257 [DOI] [Google Scholar]
- 23.Kackar R.N. and Harville D.A., Approximations for standard errors of estimators of fixed and random effects in mixed linear models, J. Am. Stat. Assoc. 79 (1984), pp. 853–862. [Google Scholar]
- 24.Lahiri P. and Rao J.N.K., Robust estimation of mean squared error of small area estimators, J. Am. Stat. Assoc. 90 (1995), pp. 758–766. doi: 10.1080/01621459.1995.10476570 [DOI] [Google Scholar]
- 25.Lastra G., Syed S., Kurukulasuriya L., Manrique C., and Sowers J., Type 2 diabetes mellitus and hypertension: An update, Endocrinol. Metabolism Clinics North Amer. 43 (2014), pp. 103–122. doi: 10.1016/j.ecl.2013.09.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Leng B., Jin Y., Li G., Chen L., and Jin N., Socioeconomic status and hypertension: A meta-analysis, J. Hypertens. 33 (2015), pp. 221–229. doi: 10.1097/HJH.0000000000000428 [DOI] [PubMed] [Google Scholar]
- 27.Lohr S. and Ybarra L., Small area estimation when auxiliary information is measured with error, Biometrika 95 (2008), pp. 919–931. doi: 10.1093/biomet/asn048 [DOI] [Google Scholar]
- 28.McCulloch C. and Neuhaus J., Prediction of random effects in linear and generalized linear models under model misspecification, Biometrics 67 (2011), pp. 270–279. doi: 10.1111/j.1541-0420.2010.01435.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Moura F.A.S. and Holt D., Small area estimation using multilevel models, Surv. Methodol. 25 (1999), pp. 73–80. [Google Scholar]
- 30.NCD Risk Factor Collaboration (NCD-RisC) , Worldwide trends in blood pressure from 1975 to 2015: A pooled analysis of 1479 population-based measurement studies with 19.1 million participants, Lancet 389 (2017), pp. 37–55. doi: 10.1016/S0140-6736(16)31919-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pinheiro J.C. and Bates D.M., Mixed-Effects Models in S and S-plus, Springer, New York, 2000. [Google Scholar]
- 32.Prasad N.G.N. and Rao J.N.K., The estimation of the mean squared error of small-area estimators, J. Am. Stat. Assoc. 85 (1990), pp. 163–171. doi: 10.1080/01621459.1990.10475320 [DOI] [Google Scholar]
- 33.Robert Koch Institute , Daten und Fakten: Ergebnisse der Studie ‘Gesundheit in Deutschland aktuell 2010’, Beitrüge zur Gesundheitsberichterstattung des Bundes (2012). Available at http://www.gbe-bund.de/pdf/GEDA_2010_Gesamtausgabe.pdf, RKI, Berlin.
- 34.Schelldorfer J., Bühlmann P., and van de Geer S., Estimation for high-dimensional linear mixed-effects models using l1-penalization, Scandinavian J. Stat. 38 (2011), pp. 197–214. doi: 10.1111/j.1467-9469.2011.00740.x [DOI] [Google Scholar]
- 35.Schipf S., Ittermann T., Tamayo T., Holle R., Schunk M., Maier W., Meisinger C., Thorand B., Kluttig A., Greiser K.H., Berger K., Müler G., Moebus S., Slomiany U., Rathmann W., and Völzke H., Regional differences in the incidence of self-reported type 2 diabetes in Germany: Results from five population-based studies in Germany (DIAB-CORE consortium), J. Epidemiol. Community. Health. 68 (2014), pp. 1088–1095. doi: 10.1136/jech-2014-203998 [DOI] [PubMed] [Google Scholar]
- 36.Tibshirani R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B (Methodol.) 58 (1996), pp. 267–288. [Google Scholar]
- 37.Tseng P. and Yun S., A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. 117 (2009), pp. 387–402. doi: 10.1007/s10107-007-0170-0 [DOI] [Google Scholar]
- 38.Twigg L., Moon G., and Jones K., Predicting small-area health-related behaviour: A comparison of smoking and drinking indicators, Soc. Sci. Med. 50 (2000), pp. 1109–1120. doi: 10.1016/S0277-9536(99)00359-7 [DOI] [PubMed] [Google Scholar]
- 39.Zou H. and Hastie T., Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B (Methodol.) 67 (2005), pp. 301–320. doi: 10.1111/j.1467-9868.2005.00503.x [DOI] [Google Scholar]


