Skip to main content
PLOS One logoLink to PLOS One
. 2019 Nov 22;14(11):e0225427. doi: 10.1371/journal.pone.0225427

Sample size issues in multilevel logistic regression models

Amjad Ali 1,*, Sabz Ali 1, Sajjad Ahmad Khan 1,*, Dost Muhammad Khan 2, Kamran Abbas 3, Alamgir Khalil 4, Sadaf Manzoor 1, Umair Khalil 2
Editor: Feng Chen5
PMCID: PMC6874355  PMID: 31756205

Abstract

Educational researchers, psychologists, social, epidemiological and medical scientists are often dealing with multilevel data. Sometimes, the response variable in multilevel data is categorical in nature and needs to be analyzed through Multilevel Logistic Regression Models. The main theme of this paper is to provide guidelines for the analysts to select an appropriate sample size while fitting multilevel logistic regression models for different threshold parameters and different estimation methods. Simulation studies have been performed to obtain optimum sample size for Penalized Quasi-likelihood (PQL) and Maximum Likelihood (ML) Methods of estimation. Our results suggest that Maximum Likelihood Method performs better than Penalized Quasi-likelihood Method and requires relatively small sample under chosen conditions. To achieve sufficient accuracy of fixed and random effects under ML method, we established ‘‘50/50” and ‘‘120/50” rule respectively. On the basis our findings, a ‘‘50/60” and ‘‘120/70” rules under PQL method of estimation have also been recommended.

Introduction

Individuals, who are drawn from a hospital, school or a classroom, tend to share more homogeneity as compared to those drawn from a population which is very large in size. As such individuals will always enjoy various common properties like family background, morals and values, religion, socio-economic status, demographic, etc., complete independence of observations in such situations is never going to happen [1]. If we have nested data or multilevel data, the assumption of independence will be clearly violated and the application of analysis of variance (ANOVA) and linear regression will be incorrect because these two classical models assume independence, so substitute statistical models (Multilevel Models) needed to examine and analyze such nested data [2]. Most of the time the data is in the form of multilevel data structure like in hospitals and educational institutions, and for this type of data researchers frequently used statistical models called multilevel models, hierarchical models, mixed effects models [3], [4], which are gaining recognition very rapidly. For the last 10 years, these models have become much admired and still are on rise in terms of popularity among researchers in various fields. As one of the prime questions in any field of research is to decide about an appropriate sample size, the decision and issues regarding sample size are not very straight forward in multilevel modeling. Therefore, for a quantitative study, the decision about the optimum sample size can be extremely tricky due to estimation complexity of the models and the size of the sample at each level. The issues of sample size in multilevel models have been discussed by various researchers for continuous response variable. According to [5] for a model having fixed coefficients, the group size of less than 10 is enough. However, for random coefficients a group size of ≥ 10 is needed. [6] Concluded that to get high power and accuracy one should use more level-2 units than level-1 units. Similarly, for level 2 effects and cross-level interactions the power of the test mainly depends on level 2 units. [7] carried out simulation study regarding sample size issues in multilevel models for a continuous response variable and they determined that for fixed effects 10 groups are sufficient, for contextual effects 30 groups are essential and for valid estimation of standard errors 50 groups are required. Similarly, Maas and Hox [8] carried out another simulation study for a continuous response variable by taking three groups (30, 50,100), three group sizes (5, 30, 50) and Intra Class Correlation i.e. ICC (0.1, 0.2, 0.3). They concluded that across all simulated conditions, the estimates were unbiased and reported under estimation of level 2 variance components when number of groups was below 100. It was also concluded that for better estimation at least a sample of size 100 is needed for level 2. However, there is fewer research conducted in the context of binary response variable. Moineddin et al., [9] performed simulation study for the determination of sample size for multilevel binary logistic regression model with single level-1 explanatory variable and single level-2 explanatory variable and by taking three groups conditions(30,50,100), three group sizes (5,30,50) and ICC (0.04,0.17,0.38). They came to the conclusions that when the number group is equal to hundred with a group size of fifty or more, the fixed effect parameters estimates were unbiased. Secondly, when the number group is equal to hundred with a group size of fifty, the variance components were reported to have a bias. The amount of bias was extremely high for the random effects as well as for the fixed effect when the group size was five. The standard errors for the variance components were underestimated and for fixed effect parameters they were unbiased. Paccagnella [10] used a multilevel binary logistic random intercept model in the simulation study and explored that similar to continuous response variable model, the bias of fixed parameter estimates decreased with increase in number of groups. Acceptable coverage rates were achieved for fixed effects estimates when number of groups was 50. Unlike continuous outcome variable models, a very large number of groups is needed to achieve acceptable coverage rates for the variance components estimates.

Zeng [11] proposed a Bayesian spatial generalized ordered logit model to analyze freeway crash severity. The suggested model was superior as compared to the traditional generalized ordered logit model in terms of statistical significance of the spatial term and better model fit. Similarly, to analyze crash rate by injury severity, three temporal multivariate random parameters Tobit models were developed by Zeng [12]. In all of the temporal models, significant temporal effects are found and the goodness of fit (Bayesian R2) of the multivariate random parameters, Tobit regression improves considerably due to the inclusion of temporal correlation. The inclusion of spatio-temporal correlation and interaction in a multivariate random-parameters Tobit model and their influence on fitting arial crash rates with different severity outcomes have been investigated by [13] in the Bayesian context. The proposed model performs better in terms of model fit than a multivariate random-parameters Tobit model and a multivariate random parameters spatial Tobit model.

A driving simulator experiment was conducted by [14] to investigate the safety of the truck under crosswind at the bridge-tunnel section. Steering angle and the yawing rate were the indices of the dynamic response under crosswinds. To prevent the possible accident, the authors recommended various safety options. In another study, [15] used Mixed logit models to reveal random effects. This was the first ever investigation of the difference in driver-injury severity between single vehicle (SV) and multi-vehicle accidents (MV). Respective critical risk factors of SV and MV accidents were evaluated and compared. Comprehensive observations, which have not been covered in the existing studies, were made. Additionally, to examine factors affecting injury sustained by two drivers involved in the same rear-end crash between passenger cars, a random parameters bivariate ordered probit model has been developed by Chen [16]. The proposed model outperforms the two separate ordered probit models with fixed parameters.

In multilevel models small group sizes such as 5, 10, and 15 and 20 are usually considered in education, behavioral science, etc. But here, large group number and moderate group sizes have been utilized. As compared to the linear multilevel models, larger group numbers are needed for multilevel logistic regression models. That is why small group number has been ignored in this study. Moreover, Bayesian methods may also be very useful in such situations, and can reduce model misspecification and estimation bias significantly.

So far, very little research has been conducted in the literature regarding sample size determination in the context of multilevel logistic regression models. For example, there is nothing in the contemporary research about PQL method in multilevel logistic regression models. Therefore, the present study attempts to capitalize on a novel state-of-the-art rule that encompasses all the weaknesses of the available methods for both fixed and random effects estimates under ML and PQL methods of estimation. In addition, the present study also provides guidelines about optimum sample size needed for multilevel logistic regression models. Random intercept and random slope model with two level-1 and one level-2 explanatory variables using threshold parameter concept are used. Further, larger random effects are incorporated in the present study which were unnoticed in the previous literature. Moreover, relevant factor and their levels were also ignored in [9] and [10]. A detailed comparison of ML method and PQL method has been made in terms of sample size.

Materials and methods

Multilevel Logistic Regression Model:

A very popular concept is used in social sciences to develop a dichotomous multilevel logistic model through a latent continuous variable model [17]. A threshold concept is used that the latent continuous variable Yij* underlies the observed variable Yij. A simple two level dichotomous model is

Yii*=β0j+β1jX1ij+β2X2ij+eijLevel1modelβ0j=γ00+γ01Wj+uojLevel2modelβ1j=γ10+γ11Wj+u1jβ2=γ20 (1)

The combined model was obtained by substituting level 2 model in level 1 model:

Yij*=(γ00+γ10X1ij+γ20X2ij+γ01Wj+γ11X1ijWj)+(uoj+u1jXij+eij)(Fixedpart)+(Randompart) (2)

Where X1ij and X2ij are the Level-1 explanatory variables, Wj is the Level-2 explanatory variable, Level 1 coefficients denoted by β and γs are the fixed effects. If eij∼ logistic (0,π2/3), the model is, then, referred to as multilevel logistic model [18]. Random effects of level 2 assumed to have a multivariate normal distribution

[uoju1j]N([00],[σu2σu1σu1σ12]) (3)

It should be noted that the equation for Intra Class Correlation (ICC) is

ICC=σu2/(σu2+π2/3) (4)

According to [19], ICC is an estimate of the total variance explained by the grouping structure.

Now Yij* can be linked with the observed variable Yij through a threshold γ, and this threshold is also the intercept of the above model. As we have only two categories, so

Yij=0ifYij*γandYij=1ifYij*>γ.

The other concept or approach towards multilevel logistic regression models is that of Multilevel Generalized Linear Models. Both approaches lead to equivalent models, but certainly different at the conceptual level.

Let Yij be a binary response variable representing the occurrence or nonoccurrence of some characteristics having values 0 and 1, corresponding to the individual level unit (i = 1,2……nj, j = 1,2……N) and i is nested in j. A multilevel dichotomous logistic model with two level 1 explanatory variables and single level 2 explanatory variable can be written as

logit(Pij)=β0j+β1jX1ij+β2X2ijLevel1modelβ0j=γ00+γ01Wj+uojLevel2modelβ1j=γ10+γ11Wj+u1jβ2=γ20 (5)

The combined model was obtained by substituting level 2 model in level 1 model:

logit(Pij)=(γ00+γ10X1ij+γ20X2ij+γ01Wj+γ11X1ijWj)+(uoj+u1jX1ij)(Fixedpart)+(Randompart) (6)

This particular model was utilized in the present study.

It should be noted that the lowest error eij is absent in the Eq (1) because it is part of Generalized Linear Model specification [20]. This particular framework of generalized linear model is very popular in biostatistics.

It would be easier to formulate the equation as Pij = expit (regression equation) where expit is the inverse of the logit function [6]. Then for simulation studies, one would have to specify the mean and variance of all predictor variables, and the values of all regression coefficients. The predictor variables would be randomly generated, and the expit function would turn the continuous prediction into a proportion, which can then be dichotomized according to the chosen threshold. Similarly, one can use the following expression

Pij=exp(β0j+β1jX1ij+β2X2ij)/1+{exp(β0j+β1jX1ij+β2X2ij)} (7)

Simulation design

The threshold parameter was set to (γ00 = -1.22), corresponding to approximately twenty percent prevalence rate of the response variable. Similarly, the other fixed effect parameters (γ10,γ20,γ01,γ11) were set on the analogy of the studies conducted by [7] and [9]. That is γ10 = 0.3, γ20 = 0.3, γ01 = 0.3, γ11 = 0.3. The explanatory variables (X1ij,X2ij and Wj) were all generated from standard normal distribution. u0jN(0,σu2) and u1jN(0,σ12), where σu2 follows from intra-class correlation specification. The σ12 value was set to 1 in all simulation scenarios and for simplicity, the term σu1 was set to zero.

Four scenario for the number of group’s factor and three each for group size and ICC were used. The number of Groups were taken as (30, 50,100 and 120), Group sizes were (5, 30 and 50) and ICC were set to be (0.1, 0.2 and 0.4). It means, we have 4×3×3 = 36 scenarios and for each one, the number of simulations “R” was set to be 1000.

Analysis

The accuracy of different fixed effect and random effect parameters estimates were calculated through the relative bias = (estimate-parameter)/parameter. Empirical coverage rates of 95% confidence intervals were used to judge the accuracy of the standard errors of estimated parameters. The 95% confidence intervals coverage rates were computed in each scenario as the proportion of replications in which the true parameter is captured by the 95% confidence interval. Bradley recommended acceptable coverage rates as 92.5% to 97.5% [21]. Empirical powers were computed for the X1ijX2ij, Wj, and X1ij×Wj. The power was calculated as the number of replications in which H0 of null effect was correctly rejected at 5 percent level of significance divided by 1000 as 1000 replications was used for each scenario. Moreover, a separate logistic regression was used to judge the influence of various simulation scenarios on estimates empirical coverage rates.

Results

Average relative bias of the Multilevel Binary Logistic Model Fixed Effects and Random Effects Estimates across all conditions under ML method of estimation is presented in Figs 17. Estimates have negligible bias when the number of groups is large. Figs 17 indicate that the bias reduces significantly with the group sizes and the number of groups for all the estimates. Estimates were substantially biased in conditions when the number of groups was 30, group size was 5 and random effects had their smallest values. The relative bias was generally less than 5% when the number of groups was 50. For threshold estimate, the relative bias was negative, and for rest of the fixed effect estimates, the relative bias was positive. Figs 17 show that the bias reduces significantly with the number of groups for all the estimates. The group size factor has minimal impact on estimates average relative biases.

Fig 1. Average relative bias for the estimate of the threshold parameter across all conditions (ML method).

Fig 1

Fig 7. Average relative bias for σ1 across all conditions (ML method).

Fig 7

Fig 2. Average relative bias for the estimate of level 1 variable X1ij coefficient across all conditions (ML method).

Fig 2

Fig 3. Average relative bias for the estimate of level 1 variable X2ij coefficient across all conditions (ML method).

Fig 3

Fig 4. Average relative bias for the estimate of level 2 variable Wj coefficient across all conditions (ML method).

Fig 4

Fig 5. Average relative bias for the estimate of cross-level interaction X1ijWj coefficient across all conditions (ML method).

Fig 5

Fig 6. Average relative bias for σu across all conditions (ML method).

Fig 6

Table 1 reflects the influence of the number of groups on multilevel binary logistic model estimates empirical coverage rates under ML method of estimation. This actually indicates a significant effect of the number of groups on the accuracy of estimates standard errors. The largest non-coverage for threshold parameter estimate was 5.8% when the number of groups was 30. Similarly, for γ10, γ20, γ01 and γ11 the largest non-coverage rates were 5.5%,5.6%,6.1%,6.3% respectively. Furthermore, for σu and σ1 the largest non-coverage rates were 11.2% and 10.6% respectively. The non-coverage rates decreased significantly with increasing the number of groups. The influence of the number of groups was significant on the empirical coverage rate for both fixed effects and random effects estimates.

Table 1. 95% CI Coverage rates for the estimates of multilevel binary logistic model by groups (Method = ML).

Parameters Number of Groups
30 50 100 120 P-Value
γ00
γ10
γ20
γ01
γ11
σu
σ1
0.942
0.945
0.944
0.939
0.937
0.888
0.894
0.943
0.947
0.947
0.939
0.940
0.901
0.910
0.949
0.947
0.951
0.947
0.948
0.908
0.931
0.967
0.964
0.963
0.966
0.961
0.926
0.939
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000

Similarly, Table 2 reveals the influence of group size factor on empirical coverage rates in multilevel binary logistic model estimates under ML method of estimation. The group size factor did not play a dominant role in raising the accuracy of standard errors of the estimates. The coverage rates of fixed effects estimates were all acceptable at all group sizes. A separate logistic regression was used to judge the effect of group size levels on estimates empirical coverage rates. P-values indicates the impact of group size factor on both fixed and random effect estimate empirical coverage rates.

Table 2. 95% CI Coverage rates for the estimates of multilevel binary logistic model by group size (Method = ML).

Parameters Group Size
5 30 50 P-Value
γ00
γ10
γ20
γ01
γ11
σu
σ1
0.953
0.955
0.953
0.951
0.951
0.896
0.912
0.951
0.949
0.949
0.946
0.943
0.904
0.922
0.946
0.949
0.952
0.948
0.946
0.916
0.920
0.0116
0.0091
0.7870
0.1910
0.0700
0.0000
0.0335

Moreover, Table 3 shows the influence of ICC on multilevel binary logistic model estimates empirical coverage rates under ML method of estimation. The influence of different levels of ICC was insignificant on empirical coverage rates of both fixed effects and random effects estimates when separate logistic regression was used to judge the effect of different ICC conditions on empirical coverage rates of estimates.

Table 3. 95% CI Coverage rates for the estimates of multilevel binary logistic model by ICC (Method = ML).

Parameters Group Size
0.1 0.2 0.4 P-Value
γ00
γ10
γ20
γ01
γ11
σu
σ1
0.952
0.950
0.949
0.946
0.946
0.901
0.920
0.951
0.951
0.953
0.949
0.948
0.907
0.919
0.948
0.950
0.951
0.948
0.946
0.908
0.918
0.2591
0.9760
0.4710
0.5610
0.9540
0.0519
0.4639

Figs 814 show the average relative bias for the Multilevel Binary Logistic Model Fixed Effects and Random Effects Estimates across all conditions under PQL method of estimation. It can be observed that the bias reduces significantly with the group sizes and the number of groups for all the estimates.

Fig 8. Average relative bias for the estimate of the threshold parameter across all conditions (PQL method).

Fig 8

Fig 14. Relative bias for σ1 across all conditions (PQL method).

Fig 14

Fig 9. Average relative bias for the estimate of level 1 variable X1ij coefficient across all conditions (PQL method).

Fig 9

Fig 10. Average relative bias for the estimate of level 1 variable X2ij coefficient across all conditions (PQL method).

Fig 10

Fig 11. Average relative bias for the estimate of level 2 variable Wj coefficient across all conditions (PQL method).

Fig 11

Fig 12. Average relative bias for the estimate of cross-level interaction X1ijWj coefficient across all conditions (PQL method).

Fig 12

Fig 13. Average relative bias for σu across all conditions (PQL method).

Fig 13

Table 4 reflects the influence of the number of groups on multilevel binary logistic model fixed effects and random effects estimates empirical coverage rates under PQL method of estimation. The largest non-coverage for the threshold parameter estimate was 8.6% when the number of groups was 30 and it reached 7.2% when the number of groups was 120. Similarly, for γ10, γ20, γ01 and γ11 the largest non-coverage rates were 8.7%,8.7%,8.8%,9.6% respectively. Furthermore, for σu and σ1 the largest non-coverage rates were 13.5% and 12.6% respectively. The influence of the number of groups was insignificant in most of the conditions when separate logistic regression was used to judge the effect of the number of groups on estimates empirical coverage rates.

Table 4. 95% CI Coverage rates for estimates of the multilevel binary logistic model by groups (Method = PQL).

Parameters Number of Groups
30 50 100 120 P-Value
γ00
γ10
γ20
γ01
γ11
σu
σ1
0.914
0.913
0.913
0.912
0.904
0.865
0.874
0.925
0.916
0.917
0.916
0.909
0.866
0.874
0.927
0.917
0.918
0.917
0.912
0.869
0.875
0.928
0.919
0.921
0.923
0.916
0.872
0.880
0.0007
0.1056
0.0563
0.0132
0.0049
0.1458
0.2361

In the same way, Table 5 reveals the influence of group size factor on multilevel binary logistic model fixed effects and random effects estimates empirical coverage rates under PQL method of estimation. The group size factor played a dominant role in the reduction of estimates non-coverage rates. A separate logistic regression was used to judge the effect of group size factor conditions on estimates empirical coverage rates. The coverage rates were significantly affected by the group size factor. P-values indicate the impact of group size on estimates empirical coverage rates. Additionally, Table 6 highlights the influence of ICC on fixed effects and random effects estimates empirical coverage rates of the multilevel binary logistic model under PQL method of estimation. The influence of different levels of ICC was significant in most of the conditions on both fixed effects and random effects estimates empirical coverage rates when separate logistic regression was used to judge the effect of different levels of ICC on estimates empirical coverage rates.

Table 5. 95% CI Coverage rates for estimates of the multilevel binary logistic model by group size (Method = PQL).

Parameters Group Size
5 30 50 P-Value
γ00
γ10
γ20
γ01
γ11
σu
σ1
0.917
0.905
0.906
0.910
0.900
0.861
0.867
0.924
0.918
0.918
0.918
0.910
0.869
0.876
0.930
0.926
0.928
0.925
0.921
0.874
0.881
0.0000
0.0000
0.0000
0.0000
0.0000
0.0035
0.0000

Table 6. 95% CI Coverage rates for estimates of the multilevel binary response variable model by ICC (Method = PQL).

Parameters Group Size
0.1 0.2 0.4 P-Value
γ00
γ10
γ20
γ01
γ11
σu
σ1
0.929
0.920
0.920
0.919
0.914
0.872
0.879
0.923
0.917
0.917
0.917
0.910
0.868
0.876
0.918
0.912
0.914
0.915
0.906
0.863
0.871
0.0011
0.0223
0.1336
0.1531
0.0493
0.0245
0.0421

Table 7 lists the power rates for multilevel binary logistic model fixed effects estimates under ML method of estimation. The lowest power rates were recorded for all the fixed effects estimates when the number of groups was 30. The power increased substantially with the number of groups. With 100 groups, power rates were well above 0.90.With 120 groups, power rates were 100% in majority of the conditions under ML method. On the contrary, PQL fixed effects estimates power rates were lower than that of ML fixed effects estimates power rates, on average. Power rates increased with the number of groups under both methods of estimation. Table 8 lists the power rates for multilevel binary logistic model fixed effects estimates under PQL method of estimation.

Table 7. Power rates for Fixed effects estimates of the multilevel binary response variable model by groups (Method = ML).

Parameters Number of Groups
30 50 100 120
γ00
γ10
γ20
γ01
γ11
0.475
0.509
0.491
0.516
0.459
0.756
0.792
0.782
0.802
0.743
0.902
0.911
0.900
0.919
0.914
0.999
1.000
0.999
1.000
1.000

Table 8. Power rates for Fixed effects estimates of the multilevel binary response variable model by groups (Method = PQL).

Parameters Number of Groups
30 50 100 120
γ00
γ10
γ20
γ01
γ11
0.451
0.465
0.471
0.489
0.442
0.729
0.754
0.749
0.775
0.729
0.876
0.888
0.869
0.891
0.884
0.993
0.996
0.995
0.996
0.995

Conclusions

In ML method of estimation, the fixed effects estimates were unbiased even with 30 groups; however, the accuracy of the standard errors of fixed effects estimates was achieved when the number of groups was 50. In addition, random effects estimates were underestimated, particularly with 30 groups. Unlike fixed effects estimates standard errors, the accuracy of the random effects estimates standard errors was achieved when the number of groups was 120. Overall, the influence of the number of groups was significant on the accuracy of multilevel binary logistic model estimates and their standard errors. However, group sizes effect was insignificant in most of the conditions on the accuracy of estimates and estimates standard errors. The present study not only confirms (50/50 rule, i.e. minimum of 50 groups and 50 units per group under ML method of estimation) of Moineddin et al.[9] but also suggests that 120 groups and a group size of 50 is mandatory for obtaining of sufficient accuracy of random effects when prevalence of the outcome is around 20 percent. Additionally, the influence of the number of groups was substantial on empirical power rates of fixed effects estimates under ML method of estimation. The power rates for all the fixed effects estimates increased with an increase in the number of groups. The results obtained in this study are parallel to the previous studies when the response variable is continuous [2223].

On the other hand, the fixed effects estimates had the largest biases when group size was at the lowest, i.e. 5 under PQL method of estimation. However, their biases were negligible when group size was 50. Unlike the ML method of estimation, the group size was the most significant factor that influenced the accuracy of multilevel binary logistic model estimates and estimates standard errors. Furthermore, the accuracy of the fixed effects estimates standard errors was not satisfactory even with a group size of 50. Similarly, the random effects estimates and their standard errors were under estimated under PQL method of estimation. Random effects estimates standard errors accuracy was far behind than that of ML method across all conditions. The impact of the number of groups in most of the conditions was insignificant on the accuracy of estimates standard errors. Therefore, a (50/60 rule, i.e. minimum of 50 groups and 60 units per group under PQL method of estimation) is recommended to achieve sufficient accuracy. In addition, it is also recommended that 120 groups and a group size of 70 may be used to achieve sufficient accuracy for the variance components estimates and their standard errors when prevalence of the outcome is around 20 percent. Larger ICC values also decreased the accuracy of estimates and their standard errors. Similarly, like ML method of estimation, the power rates for the multilevel binary logistic regression model fixed parameter estimates increased with the number of groups. The power rates of PQL method of estimation was also on the lower side as compared to ML method power rates.

Across all conditions, PQL method estimates and estimates standard errors of multilevel binary logistic model are not comparable to that of ML method. On the basis of the present study results, it is, therefore, recommended that PQL method for binary outcome variable may be avoided in situations such as low prevalence of outcome, larger values of random effects and even when group sizes are 50 or less. However, the significance of Penalized Quasi Likelihood method of estimation was ignored earlier which proved to be an extremely effective method when random effects are small.

Data Availability

All relevant data are within the manuscript.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Raudenbush SW. and Bryk AS. “Hierarchical linear models” Applications and data analysis methods”. (vol.1) Sage, 2002. [Google Scholar]
  • 2.Goldstein H., “Performance Indicators in Education”. Statistics in Society, London, Arnold, 1999, pp. 281–286. [Google Scholar]
  • 3.Goldstein H., “Multilevel Statistical Models”. New York, Halstead Press, 1995. [Google Scholar]
  • 4.Goldstein H., “Multilevel statistical models (3rd ed.)”. London, Hodder Arnold, 2003. [Google Scholar]
  • 5.Snijders T. A. B., and Bosker R. J., “Multilevel analysis: An introduction to basic and Advanced multilevel modeling”,London, Sage, 1999. [Google Scholar]
  • 6.Hox J. J., “Multilevel analysis: Techniques and applications”. Mahwah, NJ: Lawrence Erlbaum Associates, Inc., 2002. [Google Scholar]
  • 7.Maas C.J. and Hox J.J., “Robustness issues in multilevel regression analysis”, Statistica Neerlandica. 2004; 58(2), 127–37. [Google Scholar]
  • 8.Maas C.J. and Hox J.J., “Sufficient sample sizes for multilevel modeling”, Methodology, 2005, 1(3), 86–92. [Google Scholar]
  • 9.Moineddin R., Matheson F.I. and Glazier R.H., “A simulation study of sample size for multilevel logistic regression models”, BMC medical research methodology, 2007, 7(1):34 10.1186/1471-2288-7-34 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Paccagnella O., “Sample size and accuracy of estimates in multilevel models”, European Journal of Research Methods for the Behavioral and Social Sciences, 2011, 7(3), 111. [Google Scholar]
  • 11.Zeng Q., Gu W., Zhang X., Wen H., Lee J. and Hao W., “Analyzing freeway crash severity using a Bayesian spatial generalized ordered logit model with conditional autoregressive priors”, Accident Analysis & Prevention, 2019,127, 87–95. [DOI] [PubMed] [Google Scholar]
  • 12.Zeng Q., Wen H., Huang H., Pei X. and Wong S. C., “Incorporating temporal correlation into a multivariate random parameters Tobit model for modeling crash rate by injury severity”, Transportmetrica A: transport science, 2018, 14(3), 177–191. [Google Scholar]
  • 13.Zeng Q., Guo Q., Wong S. C., Wen H., Huang H. and Pei X., “Jointly modeling area-level crash rates by severity: a Bayesian multivariate random-parameters spatio-temporal Tobit regression”. Transportmetrica A: Transport Science, 2019, 15(2), 1867–1884. [Google Scholar]
  • 14.Chen F., Peng H., Ma X., Liang J., Hao W. and Pan X., “Examining the safety of trucks under crosswind at bridge-tunnel section: A driving simulator study”. Tunnelling and Underground Space Technology, 2019, 92, 103034. [Google Scholar]
  • 15.Chen F. and Chen S., “Injury severities of truck drivers in single-and multi-vehicle accidents on rural highways” Accident Analysis & Prevention, 2011, 43(5), 1677–1688. [DOI] [PubMed] [Google Scholar]
  • 16.Chen F., Song M., and Ma X., (2019). “Investigation on the injury severity of drivers in rear-end collisions between cars using a random parameters bivariate ordered probit model”, International journal of environmental research and public health, 2019, 16(14), 2632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Scott Long J., “Regression models for categorical and limited dependent variables”. Advanced quantitative techniques in the social sciences, 1997, 7. [Google Scholar]
  • 18.Agresti A., “Categorical Data Analysis,” Wiley, New York, 1990. [Google Scholar]
  • 19.Hox J.J., “Applied Multilevel Analysis Amsterdam: TT-Publikaties, 1995. [Google Scholar]
  • 20.McCullagh P. and Nelder J. A., “Generalised linear models”. Chapman and Hall; London, UK, 1989. [Google Scholar]
  • 21.Bradley J. V., “Robustness”. British Journal of Mathematical and Statistical Psychology, 1978, 31(2), 144–152. [DOI] [PubMed] [Google Scholar]
  • 22.Snijders T. A. and Bosker R. J., “Standard errors and sample sizes for two-level research”, Journal of Educational and Behavioral Statistics, 1993, 18(3), 237–259. [Google Scholar]
  • 23.Raudenbush S. W. and Liu X., “Statistical power and optimal design for multisite randomized trials”. Psychological methods, 2000, 5(2), 199 10.1037/1082-989x.5.2.199 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Feng Chen

11 Sep 2019

PONE-D-19-14953

SAMPLE SIZE ISSUES IN MULTILEVEL LOGISTIC REGRESSION MODELS

PLOS ONE

Dear Dr. Khan,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Oct 26 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Feng Chen

Academic Editor

PLOS ONE

Journal Requirements:

1.  When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study focuses on dealing with the sample size issue in multilevel Logistic regression models, which is important for the applications of these models in practice. While it is worth of investigation, the authors should highlight their academic contributions more significantly. At least, the gap between the current research and the previous should be stated clearly in the Section of Introduction. Some other more detailed comments are as follows:

1. The authors should number lines in the manuscript for the convenience of paper review.

2. The authors investigated the same size issue for maximum likelihood and penalized quasi-likelihood estimation methods separately. While these estimation methods are popular, Bayesian hierarchical modeling has also gained much prevalence in recent years. Under Bayesian hierarchical modeling framework, not only the multilevel structure but also spatial and temporal correlations are accounted for, which can reduce model misspecification and estimation bias significantly. The authors should review some representative theoretical works and their practical applications. For example, the following papers on Bayesian hierarchical/spatial-temporal modeling can be acknowledged in the section of Introduction:

Zeng Q., Gu W., Zhang X., Wen H., Lee J., Hao W. (2019). Analyzing freeway crash severity using a Bayesian spatial generalized ordered logit model with conditional autoregressive priors. Accident Analysis & Prevention, 127, 87-95.

Zeng Q., Wen H., Huang H., Pei X., Wong S.C. (2018). Incorporating temporal correlation into a multivariate random parameters Tobit model for modeling crash rate by injury severity. Transportmetrica A: Transport Science, 14 (3): 177-191.

Zeng Q., Guo Q., Wong S.C., Wen H., Huang H., Pei X., (2019). Incorporating temporal correlation into a multivariate random parameters Tobit model for modeling crash rate by injury severity. Transportmetrica A: Transport Science, 15 (2): 1867-1884.

3. In the Result Section, the authors are suggested to illustrate the findings with references to those in the previous, to further justify the reasonableness of the results. Especially, the differences between the results for the two estimation methods should be explained explicitly, as it may be a significant potential contribution of this research.

4. The figures shown in the manuscript are not discussed in the text.

5. Language editing is required, because there are many grammar errors and improper expressions in the manuscript.

Reviewer #2: The manuscript attempts to study the requirements on sample sizes in multilevel logistic regression models. The manuscript is overall well written and structured. There are, however, some revisions required before it can be considered for publication.

1. The authors state that the objective of the study is to determine the optimal sample size of multilevel logistic regression. Nevertheless, the abstract seems to be gear towards the comparison between ML and PQL estimations. The abstract should discuss more about the findings on sample sizes in addition to the difference between ML and PQL methods.

2. Since Plos One has a readership with various background instead of statistics and econometrics. The authors should explain the acronyms at its first mention. For example, In page 2, Line9, ICC should be explained at its first mention.

3. In explaining the wide use of multilevel models, the authors are suggested to cite references that adopts multilevel models in other areas. The following literature should be discussed and acknowledged in the literature:

[1] Feng Chen, Haorong Peng, Xiaoxiang Ma, Jieyu Liang, Wei Hao, Xiaodong Pan(2019) “Examining the safety of trucks under crosswind at bridge-tunnel section: A driving simulator study”, Tunnelling and Underground Space Technology, 92, 103034. https://doi.org/10.1016/j.tust.2019.103034

[2] F. Chen and S. R. Chen (2011). “Injury severities of truck drivers in single- and multi-vehicle accidents on rural highway”, Accident Analysis and Prevention, 43(5), 1677-1688.

[3] Feng Chen, Mingtao Song and Xiaoxiang Ma (2019), Investigation on the Injury Severity of Drivers in Rear-End Collisions Between Cars Using a Random Parameters Bivariate Ordered Probit Model, International Journal of Environmental Research and Public Health, 16(14) , 2632.

4. I have some concerns about the simulation design. The authors only considered samples with large group number but small group sizes. Why don’t the authors consider small group number with large sizes? The latter is usually encountered in some areas.

5. The authors didn’t consider Bayesian methods, as pointed out by Gelman and Hill, even samples with small group number and small group sizes can benefit from multilevel model under Bayesian framework.

Reference: Gelman, Andrew, and Jennifer Hill. Data analysis using regression and multilevel/hierarchical models. Cambridge university press, 2006.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2019 Nov 22;14(11):e0225427. doi: 10.1371/journal.pone.0225427.r002

Author response to Decision Letter 0


28 Oct 2019

The following three files have been uploaded.

1. Revised manuscript with track changes

2. Manuscript

3. Response to reviewers

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Feng Chen

6 Nov 2019

SAMPLE SIZE ISSUES IN MULTILEVEL LOGISTIC REGRESSION MODELS

PONE-D-19-14953R1

Dear Dr. Khan,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Feng Chen

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors should be thanked for the efforts on improving the manuscript. All my comments have been addressed properly.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

Feng Chen

13 Nov 2019

PONE-D-19-14953R1

SAMPLE SIZE ISSUES IN MULTILEVEL LOGISTIC REGRESSION MODELS

Dear Dr. Khan:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Feng Chen

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All relevant data are within the manuscript.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES