Skip to main content
PLOS One logoLink to PLOS One
. 2024 Jan 19;19(1):e0297037. doi: 10.1371/journal.pone.0297037

An empirical comparison of some missing data treatments in PLS-SEM

Lateef Babatunde Amusa 1,2,*, Twinomurinzi Hossana 1
Editor: Jibril Adewale Bamgbade3
PMCID: PMC10798466  PMID: 38241223

Abstract

PLS-SEM is frequently used in applied studies as an excellent tool for examining causal-predictive associations of models for theory development and testing. Missing data are a common problem in empirical analysis, and PLS-SEM is no exception. A comprehensive review of the PLS-SEM literature reveals a high preference for the listwise deletion and mean imputation methods in dealing with missing values. PLS-SEM researchers often disregard strategies for addressing missing data, such as regression imputation and imputation based on the Expectation Maximization (EM) algorithm. In this study, we investigate the utility of these underutilized techniques for dealing with missing values in PLS-SEM and compare them with mean imputation and listwise deletion. Monte Carlo simulations were conducted based on two prominent social science models: the European Customer Satisfaction Index (ECSI) and the Unified Theory of Acceptance and Use of Technology (UTAUT). Our simulation experiments reveal the outperformance of the regression imputation against the other alternatives in the recovery of model parameters and precision of parameter estimates. Hence, regression imputation merit more widespread adoption for treating missing values when analyzing PLS-SEM studies.

Introduction

Because of its predictive power combined with its explanatory strengths, partial least squares structural equation modeling (PLS-SEM) is thought to be well-suited to build and assess explanatory-predictive theories [1]. It has grown in popularity in recent years in a variety of fields, particularly marketing [2,3], information systems [4], and business and management studies [5,6].

A significant bias source in PLS-SEM is missing data. Missing data is generally a common problem in applied studies that might jeopardize the validity and reliability of research findings if not adequately addressed. The issue arises when participants have insufficient or unavailable data for one or more variables in the analysis model. This missingness can occur for various reasons, including participant non-response, attrition, or measurement error. The consequence of incomplete data can be severe, resulting in inaccurate parameter estimations, lower statistical power, and potentially incorrect conclusions.

PLS-SEM researchers have routinely applied listwise deletion and mean replacement to deal with missing data [7]. While more reliable results are produced by mean replacement than case-wise deletion, mean replacement artificially introduces the problem of variance reduction [8]. In general, these approaches may introduce errors that misrepresent association coefficients. Missing data is more frequently associated with respondents with some similar features. For example, some wealthy or high-income individuals may be reluctant to expose their purchasing patterns. Depressed respondents may deliberately conceal questions about anxiety. Excluding these particular groups of respondents from datasets may cause severe relationship disruptions between variables. Deletion methods limit the data points available for analysis, and fewer samples reduce the statistical technique’s power.

Except for a few initiatives, such as case-wise and pairwise deletion, approaches to missing data imputation in PLS-SEM are not well-established [9]. On the other hand, quite a number of studies [8,1012] used extensive numerical simulations to assess the utility and performance of imputation techniques in covariance-based SEM. In summary, strategies for dealing with incomplete data in PLS-SEM are rarely explored or investigated.

Filling this research gap is critical because PLS-SEM is a popular and widely utilized analysis approach. According to Hair et al. [13], the extent to which these approaches can be used to impute missing data in PLS-SEM analysis is unknown. Missing imputation approaches that rely on regression or the expectation-maximization (EM) algorithm are not generally acknowledged in the PLS-SEM literature.

This study investigates the relative performance of four missing data techniques in the context of PLS-SEM. The four methods studied were listwise deletion (or complete case analysis), mean imputation, regression imputation, and EM imputation. Specific research questions addressed how the methods comparatively (1) impact measurement model quality in terms of convergent validity, (2) introduced bias in estimating structural model parameters, and (3) affect the model accuracy as estimated by the model standard errors.

The assessments assumed three missing data mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).

Methodology

In this study, we aim to compare different missing data techniques for Partial Least Squares Structural Equation Modeling (PLS-SEM) using Monte Carlo simulation. PLS-SEM is a widely used statistical approach for analyzing complex relationships among latent variables. Missing data is a common issue in empirical research, and it can significantly affect the accuracy and validity of the results obtained from PLS-SEM analyses. Therefore, it is important to evaluate the performance of various missing data techniques in the context of PLS-SEM to identify the most suitable approach. Though PLS-SEM frequently employs list-wise deletion and mean replacement, the classical handbook of PLS-SEM [7] acknowledged that there was limited knowledge of the applicability of regression or expectation maximization algorithm for missing data imputation.

The specific methods explored in this work are described below. They cover only a handful of the many methods presented, but we feel they reflect a spectrum of commonly used and promising approaches.

Complete Case Analysis (CCA): In CCA, also known as listwise deletion, all observations with missing values are removed from the data analysis. Case-wise deletion is simple to implement but always reduces the sample size. When nonignorable missing data is present, CCA causes further complications since the pattern of data missingness is nonrandom and not predicted from other variables in the database. This technique will only produce unbiased estimates if the missing data are missing at random (MCAR) or if all mechanism variables have been included to make the missing data ignorable.

Mean imputation (MI): It involves replacing incomplete data with the mean of non-missing elements. This technique is popular and simple, but it reduces variable variability due to the replacement of all missing entries of the same variable by a constant value [10].

Regression imputation: Based on available data, the variable with incomplete data is regressed on other variables with complete data in a multiple regression model. The estimated outcomes of this fitted regression model are then used to replace the missing values. The variability of variables is better preserved in regression due to the variance of error terms than a constant imputation like the mean imputation.

Expectation-Maximization Algorithm (EM): A more popular approach is the use of the EM algorithm [14,15]. The EM algorithm estimates missing values by iteratively imputing values based on the available data. It starts by initializing the missing values in the dataset with reasonable initial estimates (e.g., mean imputation or random imputation). In the Expectation step, the algorithm estimates the expected values of the missing data given the observed data and the current parameter estimates. It computes the conditional means or imputation values for the missing values based on the available data and the estimated parameters. The Maximization step updates the model parameters based on the imputed data. This involves estimating the model parameters using the imputed dataset, such as estimating the latent variable scores, path coefficients, and measurement model parameters in PLS-SEM. This procedure is repeated until the parameter estimations converge.

Missing data mechanisms

Missing data mechanisms are the many ways data can be missing or incomplete. Identifying and understanding missing data mechanisms is crucial because they could influence the validity and reliability of statistical investigations and empirical conclusions. Here are some examples of common missing data mechanisms:

Missing Completely at Random (MCAR): In MCAR, data missingness is unrelated to the observed and unobserved data. It assumes that any characteristic of the data or the data collection process does not influence the probability of data being missing. In other words, the missingness occurs randomly, and there is no systematic difference between the missing and non-missing data. Though MCAR is generally regarded as a strong and often unrealistic assumption, it is considered the ideal missing data mechanism because it allows for unbiased statistical analyses. Complete case analysis and various imputation methods, including single or multiple imputations, can handle MCAR [16].

Missing at Random (MAR): MAR arises when observed variables may explain the missingness but not the missing data itself. It then implies that after controlling for other observed variables, the probability of missingness depends solely on the observed data. Imputation methods or modeling approaches, such as the full information maximum likelihood (FIML) or expectation-maximization (EM) algorithm, can be used to address MAR [17].

Missing Not at Random (MNAR): MNAR refers to situations where the missingness is systematically related to the unobserved data. This pattern of missingness is not random and can contribute significant bias to the analysis. MNAR is more challenging; researchers frequently use sensitivity analyses or multiple imputation procedures that integrate model assumptions [16].

Monte carlo simulation

We used Monte Carlo simulation to generate multiple datasets with known population parameters and simulate missing data under different conditions to evaluate the performance of various missing data techniques. The data will comprise latent variables and manifest indicators representing the theoretical model under investigation. The simulations are built on two theoretical models in Figs 1 and 2, which respectively mirror the structure of a customer satisfaction index model [18] and the UTAUT model [19]. We chose this simulation route because it reflects the typical complexity of structural equation models within the social science discipline.

Fig 1. Adapted ECSI model with model parameters from Tenenhaus et al.

Fig 1

(2005) [20].

Fig 2. Adapted UTAUT model with model parameters from Gahtani et al.

Fig 2

(2007) [19].

Different missing data mechanisms, such as Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR), were assumed to simulate missingness in the generated dataset. The PLS Mode A method with the path weighting scheme [20] was employed in the analysis.

Design factors

The simulation study design involved two levels of sample size (300 and 1000), four proportions of missing data (20%, 30%, 40%, 50%), and three missing data mechanisms (MCAR, MAR, MNAR). These three design factors led to a 2 x 4 x 3 factorial design, with each of the 24 conditions being generated and analyzed for 500 samples, resulting in a total of 12000 analyzed samples for the methods under consideration.

Performance measures

For each of the 12000 replications, we calculate the mean absolute error (MAE) for the structural model parameters corresponding to the theoretical models in Figs 1 and 2. The MAE is defined as

MAE=1tj=1t|β^jβj|,

where t is the number of parameters, β^j is the parameter estimate in any replication and βj is the prespecified model parameter.

All computations, including data generation and analyses (see S1 File) have been conducted within the R 4.3.1 environment [21] using the SEMinR [22] and cSEM.DGP packages [23].

Simulated data based on the ECSI model

This simulation experiment is based on the European Customer Satisfaction Index (ECSI) [18] model and is empirically validated by a classical study [20]. The ECSI is an economic indicator developed from the Swedish customer satisfaction barometer [24]. The model is evaluated using a 10-point Likert scale for each of the five constructs provided based on well-established theories and methodologies in consumer behaviour. These constructs, which are all reflective measures, include customer expectations, perceived quality, perceived value, customer satisfaction, customer complaints, and customer loyalty (see Fig 1). The covariance matrix of the ECSI data analyzed in a 250-sample classical study [20] was used to generate the multivariate MVs.

In terms of the measurement items, we considered equal standardized indicator loadings (λ = 0.7) for all the constructs. This is consistent with some previous Monte Carlo studies of PLS-SEM [9,25,26] to ensure conventional acceptance thresholds of validity and reliability for the measurement models. We, however, retained the estimated path model coefficients as our prespecified path model coefficients.

Simulated data based on the UTAUT model

In this example, we utilized a prominent unified theory of acceptance and use of technology (UTAUT) model [27] and an empirically validated study [19] as a benchmark for our Monte Carlo simulations. The adapted classical study comprises 21 indicators and six reflective constructs, each measured using a 7-point Likert scale [19]. The six constructs reflect the influences of social influence and cognitive instrumental constructs on perceived usefulness of IT and usage intentions. The constructs include performance expectancy, effort expectancy, social influence, facilitating conditions, behavioral intentions, and usage behaviour (see Fig 2). This model differs from the ones in the previous simulation example in terms of the number of manifest variables and measurement modes, and it provides another typical model for evaluating the missing data techniques.

Regarding prespecified model parameters, we retained the estimated parameter values in the adapted study for the measurement and structural model parameters. The estimated standardized loadings from the adapted study ranged from 0.71–0.95.

Results

Results were obtained for analyses with listwise deletion, mean imputation, regression imputation, and EM algorithm. For both simulation examples, we first evaluated the influence of the missing value techniques on validity and reliability as measured by average variance extracted (AVE) and composite reliability (CR) for the measurement models (see Figs 35). In other words, were the conventional acceptance thresholds for validity and reliability still met after the missing data treatments?

Fig 3. AVE (Top panel) and CR (Bottom panel) of the missing data techniques under simulation model 1 for N = 300 and 20% missingness for each of the different missing mechanisms.

Fig 3

Fig 5. AVE of the missing data techniques under simulation model 2 for some selected design factor combinations.

Fig 5

Panel A: MCAR, N = 300, 20% missingness, B: MCAR, N = 1000, 50% missingness, C: MAR, N = 300, 20% missingness, D: MAR, N = 1000, 50% missingness, E: MNAR, N = 300, 20% missingness, F: MNAR, N = 1000, 50% missingness.

Simulation 1

In this simulation experiment, loading-related estimates for only one of the factor combinations are shown because all loadings are the same in the true population model. This eliminates congestion and repetition because the same pattern of results occurs with all loadings. The conventional thresholds AVE = 0.5 and CR = 0.7 were superimposed (see Fig 3).

As shown in Fig 3, the AVE and CR values were well above 0.5 and 0.7, respectively. It implies that the missing data techniques produced sufficiently high internal consistency and convergent validity for the constructs.

Regarding bias in estimating structural model parameters, results are summarized in Fig 4. Similar patterns emerged in comparing the different methods across the simulation conditions for both the small (n = 300) and large (n = 1000) sample size conditions. Regression imputation (Reg) consistently produced higher MAE values with increasing missing proportions. A similar pattern was observed for complete case analysis (CC), except for the MCAR case, where lower MAE values emerged at 30% missingness and above. An opposite pattern was observed for mean imputation (Mean), while EM produced relatively constant MAE values for all missing proportions.

Fig 4. Simulation example 1: Mean absolute error (MAE) of the different missing data treatments under varying design factor combinations.

Fig 4

When the missing data are MCAR, regression imputation outperformed the other methods, with EM and CC marginally competing for the 2nd spot. Similar patterns emerged for the MAR and MNAR scenarios. CC performed worst, producing substantially higher MAE values than the other methods. Regression imputation outperformed the other methods. Though in the MAR scenario, the MAE values were marginal to EM–the 2nd best, more striking differences were observed for MNAR.

Simulation 2

For simulation example 2, conventional thresholds AVE = 0.5 and CR = 0.7 were superimposed in Figs 5 and 6, respectively. Despite unequal factor loadings in this simulation experiment, only results for some six representative factor combinations were reported due to space constraints. Besides, there was a similar pattern of results, and conclusions could be made for all 24 possible factor combinations.

Fig 6. CR of the missing data techniques under simulation model 2 for some selected design factor combinations.

Fig 6

Panel A: MCAR, N = 300, 20% missingness, B: MCAR, N = 1000, 50% missingness, C: MAR, N = 300, 20% missingness, D: MAR, N = 1000, 50% missingness, E: MNAR, N = 300, 20% missingness, F: MNAR, N = 1000, 50% missingness.

Results concerning bias in estimating structural model parameters are summarized in Fig 7. Most of our findings in simulation example 1 still holds in this structural model. Overall, regression imputation still outperforms other candidate imputation methods. Although CC performed best under the MCAR scenario, the inconsistency of results under the different sample sizes is worth noting. The differences between CC and Reg appear to be marginal at the large sample size condition (n = 1000), and Reg even produced a substantially lower MAE value at the 40% missing proportion. These inconsistencies may be due to sampling fluctuations.

Fig 7. Simulation example 2: Mean absolute error (MAE) of the different missing data treatments under varying design factor combinations.

Fig 7

Precision of parameter estimates

Using the ECSI model of example 1, an additional evaluation based on a small set of design conditions was conducted to obtain the standard errors of the estimated structural model parameters. Due to the computational cost of bootstrapping in the context of Monte Carlo simulations, we limited our assessment to one sample size condition (n = 300). Other factor conditions were varied as usual. We examined precision by averaging 200 resampled bootstrap standard errors (SE) for the path coefficients over the 500 simulated datasets. The standard errors were evaluated on an aggregate level of the structural parameters.

Fig 8 provides the standard errors for the studied structural model parameters. Complete case analysis substantially produced less precise estimates. For the small and large sample size conditions, stark differences were observed between the standard errors of complete case analysis and the other candidate methods. Though unnoticeable differences and sometimes similar standard errors were observed for the other three techniques, regression imputation numerically produced more precise estimates.

Fig 8. Standard error (SE) of structural parameter estimates for the different missing data treatments.

Fig 8

Discussion

Despite the numerous available missing data techniques, complete case analysis (or listwise deletion) and mean imputation are routinely chosen as treatments for missing data in PLS-SEM studies [28]. This is also evident in the implementation of missing data treatment available in the popular Smartpls 4.0 software [29]. We sought to evaluate other notable missing data techniques, including regression and EM imputations, and compare them with the commonly used listwise deletion and mean imputation.

Accordingly, we conducted Monte Carlo simulations to investigate the performance of these single missing data imputation in various conditions that differ in missingness mechanism, missing proportion, and sample size. Two prominent social science models empirically validated by two studies correspondingly were used as benchmarks for our simulations. Our findings are summarized and placed in the context of existing literature where necessary.

In terms of measurement model assessment, none of the methods induced unacceptable values for the reliability and convergent validity measures considered. In addition, the composite reliability and average variance extracted values measuring internal consistency and convergent validity, respectively, were indistinguishable across the board. Though all the considered techniques performed reasonably well in terms of bias in estimating structural parameters, we found that regression imputation outperformed the other candidate methods on average. This finding aligns with a previous study [9]. Complete case analysis had the worst performance except for a few inconsistent best performances under the MCAR mechanism. Across the board, its relative underperformance stood out in assessing the precision of estimates by consistently producing less precise standard errors. These results re-echo what previous critics opined about listwise deletion as being among the worst available missing data treatments for practical applications [30].

For regression and EM imputation techniques, the effect of increasing proportions of missingness was hardly noticeable on the change (increase or decrease) in the bias of estimated structural model parameters. This is similar to what we found in a previous study [12], which opined that the proportion of missing data, under the MAR assumption, should not guide decision-making in multiple imputations. In most cases, the imputation based on the EM algorithm competed favourably with the regression imputation, especially for the MAR and MNAR mechanisms. This finding is consistent with a study that compared the efficacy of imputation methods for covariance-based structural equation modelling [10]. They found the EM algorithm ranked second or third best, depending on the sample size, in comparing methods like full information maximum likelihood, multiple imputation, EM imputation, mean imputation, and regression imputation [10]. Further, a previous study mentioned EM imputation as one of the promising approaches for treating incomplete data in SEM. They recommended that additional Monte Carlo studies be conducted to assess the utility of the EM algorithm in missing data imputation [11].

This study is not without limitations. First, none of the four imputation approaches considered the uncertainty from missing data in the analysis. This can be problematic, especially when the missingness rate is considerably high, and can result in elevated false positive error rates. Multiple imputations should be used to account for the uncertainty caused by incomplete data. However, no implementation for multiple imputation exists for PLS-SEM in any software or computing programs. Second, it should be noted that the techniques used in this study do not consider the uniqueness of the analysis model and are performed pre-modeling. However, owing to this, they have broad applicability in dealing with missing values. Consequently, imputation techniques linked with the PLS-SEM model should be developed to use model insights fully and, as a result, benefit model estimates. One such method found in the literature, albeit with no software implementation or program accessible, is an EM algorithm-based method the authors termed EM PLS-SEM [31].

We note two previous studies [9,32] that empirically evaluated the impact of incomplete data in PLS-SEM estimation and highlight how this study differs. Both studies restricted the assessment to the MCAR missing data mechanism. [33] investigated solely the listwise deletion approach and focused on the estimation quality of methods other than the PLS-SEM. They compared PLS with maximum likelihood (ML) and full-information maximum likelihood (FIML) methods when estimating SEM. In other words, rather than missing value imputation, which is at the heart of our research, model estimation was the primary focus.

There are limited studies on the assessment of missing data methods for PLS-SEM. In addition, to our knowledge, no previous research has extended the performance evaluation of these missing data techniques for PLS-SEM estimation under different missing data mechanisms. Like any other, our simulation findings may be limited to the scenarios considered by our simulation data. As a result, the findings cannot be applied to situations that have not been investigated. Nevertheless, our results provide some indication of the viability of using single imputation approaches for incomplete data in PLS-SEM.

This research work has quite some theoretical implications. This study is the first to empirically compare missing data methods in PLS-SEM under the different missing data mechanisms. This is notable because of the availability of numerous empirical assessments in the context of the covariance-based SEM, which is relatively higher than PLS-SEM. We cannot make any statements on PLS performance where formative indicators prevail since our theoretical model only includes constructs measured using reflective indicators. Nonetheless, our recommendations should be valuable for practical researchers, among whom there appears to be quite a bit of disparity in the motivations for selecting one approach over another. However, there is no systematic empirical assessment to validate that choice.

Conclusions

Our simulation experiments reveal the outperformance of the regression imputation against the other alternatives in the recovery of model parameters and precision of parameter estimates. Overall, we found the under-utilized imputation approaches, namely, regression and EM imputation, useful and excellent in performance. They merit more widespread adoption for treating missing values when analyzing PLS-SEM studies.

Findings from this study have far-reaching practical implications for improving data quality, model fit, and decision-making. Properly handling missing data is essential for enhancing the robustness and utility of PLS-SEM in both research and practical applications. While our findings suggest recommending regression imputation over the others, we advise that clearly communicating the chosen missing data treatment and its implication is critical. Practitioners and researchers must convey the limitations and uncertainties associated with these methodologies.Top of Form.

Supporting information

S1 File. R scripts used for Monte Carlo simulations and data analyses.

(ZIP)

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Sarstedt M, Ringle CM, Hair JF. Partial Least Squares Structural Equation Modeling. Handbook of Market Research. 2017; 1–40. doi: [DOI] [Google Scholar]
  • 2.Hair JF, Hult GTM, Ringle CM, Sarstedt M, Thiele KO. Mirror, mirror on the wall: a comparative evaluation of composite-based structural equation modeling methods. J Acad Mark Sci. 2017;45: 616–632. doi: 10.1007/S11747-017-0517-X/METRICS [DOI] [Google Scholar]
  • 3.Sarstedt M, Hair JF, Pick M, Liengaard BD, Radomir L, Ringle CM. Progress in partial least squares structural equation modeling use in marketing research in the last decade. Psychol Mark. 2022;39: 1035–1064. doi: 10.1002/MAR.21640 [DOI] [Google Scholar]
  • 4.Benitez J, Henseler J, Castillo A, Schuberth F. How to perform and report an impactful analysis using partial least squares: Guidelines for confirmatory and explanatory IS research. Information & Management. 2020;57: 103168. doi: 10.1016/J.IM.2019.05.003 [DOI] [Google Scholar]
  • 5.Hair JF, Astrachan CB, Moisescu OI, Radomir L, Sarstedt M, Vaithilingam S, et al. Executing and interpreting applications of PLS-SEM: Updates for family business researchers. Journal of Family Business Strategy. 2021;12: 100392. doi: 10.1016/J.JFBS.2020.100392 [DOI] [Google Scholar]
  • 6.Ringle CM, Sarstedt M, Mitchell R, Gudergan SP. Partial least squares structural equation modeling in HRM research. https://doi.org/101080/0958519220171416655. 2018;31: 1617–1643. doi: 10.1080/09585192.2017.1416655 [DOI]
  • 7.Hair JF Jr, Hult GTM, Ringle CM, Sarstedt M. A primer on partial least squares structural equation modeling (PLS-SEM). Sage publications; 2021. [Google Scholar]
  • 8.Parwoll M, Wagner R. The impact of missing values on PLS model fitting. Studies in Classification, Data Analysis, and Knowledge Organization. 2012; 537–544. doi: 10.1007/978-3-642-24466-7_55/COVER [DOI] [Google Scholar]
  • 9.Kock N. Single missing data imputation in PLS-based structural equation modeling. Journal of Modern Applied Statistical Methods. 2018;17: 2. doi: 10.22237/jmasm/1525133160 [DOI] [Google Scholar]
  • 10.Olinsky A, Chen S, Harlow L. The comparative efficacy of imputation methods for missing data in structural equation modeling. Eur J Oper Res. 2003;151: 53–79. doi: 10.1016/S0377-2217(02)00578-7 [DOI] [Google Scholar]
  • 11.Brown RL. Efficacy of the indirect approach for estimating structural equation models with missing data: A comparison of five methods. http://dx.doi.org/101080/10705519409539983. 2009;1: 287–316. doi: 10.1080/10705519409539983 [DOI]
  • 12.Madley-Dowd P, Hughes R, Tilling K, Heron J. The proportion of missing data should not be used to guide decisions on multiple imputation. J Clin Epidemiol. 2019;110: 63–73. doi: 10.1016/j.jclinepi.2019.02.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hair JF Jr, Hult GTM, Ringle CM, Sarstedt M. A primer on partial least squares structural equation modeling (PLS-SEM). Sage publications; 2021. [Google Scholar]
  • 14.Little RJA, Rubin DB. Statistical analysis with missing data. Statistical Analysis with Missing Data. 2014; 1–381. doi: 10.1002/9781119013563 [DOI] [Google Scholar]
  • 15.Little RJA, Rubin DB. The Analysis of Social Science Data with Missing Values. http://dx.doi.org/101177/0049124189018002004. 1989;18: 292–326. doi: 10.1177/0049124189018002004 [DOI]
  • 16.Enders CK. Applied Missing Data Analysis (Methodology in the Social Sciences) Guilford Press. New York. 2010. [Google Scholar]
  • 17.Schafer JL, Graham JW. Missing data: Our view of the state of the art. Psychol Methods. 2002;7: 147–177. doi: 10.1037/1082-989X.7.2.147 [DOI] [PubMed] [Google Scholar]
  • 18.Fornell C. A National Customer Satisfaction Barometer: The Swedish Experience. J Mark. 1992;56: 6–21. doi: 10.1177/002224299205600103 [DOI] [Google Scholar]
  • 19.Al-Gahtani S, Hubona G, management JW-I&,2007 undefined. Information technology (IT) in Saudi Arabia: Culture and the acceptance and use of IT. Elsevier. 2007. [cited 30 Jun 2023]. doi: 10.1016/j.im.2007.09.002 [DOI] [Google Scholar]
  • 20.Tenenhaus M, Vinzi VE, Chatelin YM, Lauro C. PLS path modeling. Comput Stat Data Anal. 2005;48: 159–205. doi: 10.1016/J.CSDA.2004.03.005 [DOI] [Google Scholar]
  • 21.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2023. Available: https://www.R-project.org/.
  • 22.Ray S, Danks NP, Calero Valdez A. seminr: Building and Estimating Structural Equation Models. 2022. Available: https://CRAN.R-project.org/package=seminr. [Google Scholar]
  • 23.Rademaker M, Schamberger T. cSEM.DGP: Generate Data for Structural Equation Models. 2020. Available: https://github.com/M-E-Rademaker/cSEM.DGP. [Google Scholar]
  • 24.Fornell C, Bookstein FL. Two Structural Equation Models: LISREL and PLS Applied to Consumer Exit-Voice Theory. 1982;19: 440–452. doi: 10.1177/002224378201900406 [DOI] [Google Scholar]
  • 25.Kock N. Should bootstrapping be used in pls-sem? Toward stable p-value calculation methods. Journal of Applied Structural Equation Modeling. 2018;2. doi: 10.47263/JASEM.2(1)02 [DOI] [Google Scholar]
  • 26.Henseler J, Chin WW. A Comparison of Approaches for the Analysis of Interaction Effects Between Latent Variables Using Partial Least Squares Path Modeling. 2010;17: 82–109. doi: 10.1080/10705510903439003 [DOI]
  • 27.Venkatesh V, Morris MG, Davis GB, Davis FD. User acceptance of information technology: Toward a unified view. MIS Q. 2003;27: 425–478. doi: 10.2307/30036540 [DOI] [Google Scholar]
  • 28.Guenther P, Guenther M, Ringle CM, Zaefarian G, Cartwright S. Improving PLS-SEM use for business marketing research. Industrial Marketing Management. 2023;111: 127–142. doi: 10.1016/J.INDMARMAN.2023.03.010 [DOI] [Google Scholar]
  • 29.Ringle CM, Wende Sven, Becker J-Michael. SmartPLS 4. Oststeinbek: SmartPLS. Retrieved from https://www.smartpls.com; 2022.
  • 30.Wilkinson L. Statistical methods in psychology journals: Guidelines and explanations. American Psychologist. 1999;54: 594–604. doi: 10.1037/0003-066X.54.8.594 [DOI] [Google Scholar]
  • 31.Wang H, Lu S, Liu Y. Missing data imputation in PLS-SEM. Qual Quant. 2022;56: 4777–4795. doi: 10.1007/S11135-022-01338-4/METRICS [DOI] [Google Scholar]
  • 32.Grimm MS, Wagner R. The Impact of Missing Values on PLS, ML and FIML Model Fit. Archives of Data Science, Series A. 2020. [Google Scholar]
  • 33.Grimm MS, Wagner R. The Impact of Missing Values on PLS, ML and FIML Model Fit. Archives of Data Science, Series A. 2020. [Google Scholar]

Decision Letter 0

Jibril Adewale Bamgbade

17 Oct 2023

PONE-D-23-24843An empirical comparison of some missing data treatments in PLS-SEMPLOS ONE

Dear Dr. Amusa,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. When preparing your revised manuscript, you are asked to carefully consider the reviewer comments which are attached, and submit a list of responses to the comments.

Please submit your revised manuscript by the Dec 01 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jibril Adewale Bamgbade

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for submitting your manuscript to PLOS ONE. I have provided some feedback as detailed below:

Introduction:

The introduction provides a clear background and context, making sense of the pertinence of PLS-SEM and missing data issues. It effectively outlines the research gap and emphasizes the significance of addressing it. Notwithstanding, the introduction is too lengthy and lack compactness. Additionally, while the research gap is mentioned in the introduction, the research objectives and questions could be more clearly stated.

Methodology

The methodology section is well-structured, detailing the approach used for the study. It provides a clear explanation of the missing data mechanisms and design factors considered in the simulations. The use of Monte Carlo simulations is appropriate for this type of study. Nevertheless, the section could be more reader-friendly by breaking down complex technical details into simpler language for non-expert readers. The methodology also lacks a discussion of potential limitations or assumptions made during the simulations.

Discussion

The discussion section effectively summarizes and interprets the results, relating them to the research questions. It compares different missing data imputation techniques, highlighting their strengths and weaknesses. However, the discussion could provide more insight into the practical implications of the findings for researchers using PLS-SEM. It would also benefit from a more comprehensive review of related literature to place the results in a broader context.

Conclusion

The conclusion summarizes the key findings concisely. It reiterates the importance of using regression imputation and EM imputation for addressing missing data in PLS-SEM. The conclusion could however be strengthened by discussing the real-world implications and practical recommendations resulting from the study.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Jan 19;19(1):e0297037. doi: 10.1371/journal.pone.0297037.r002

Author response to Decision Letter 0


23 Oct 2023

Introduction:

The introduction provides a clear background and context, making sense of the pertinence of PLS-SEM and missing data issues. It effectively outlines the research gap and emphasizes the significance of addressing it. Notwithstanding, the introduction is too lengthy and lack compactness. Additionally, while the research gap is mentioned in the introduction, the research objectives and questions could be more clearly stated.

Response: We have reduced the introduction section by moving the subsection on missing data mechanisms to the methodology section. The introduction section is now more compact. The aim and objectives of the research have now been clearly stated.

Thank you.

Methodology

The methodology section is well-structured, detailing the approach used for the study. It provides a clear explanation of the missing data mechanisms and design factors considered in the simulations. The use of Monte Carlo simulations is appropriate for this type of study. Nevertheless, the section could be more reader-friendly by breaking down complex technical details into simpler language for non-expert readers. The methodology also lacks a discussion of potential limitations or assumptions made during the simulations.

Response: Thank you. We have made assumptions about our simulations that they mimic the social sciences, and by implication, our findings are valid only within the boundaries of the scenarios we investigate. We wrote specifically “We chose this simulation route because it reflects the typical complexity of structural equation models within the social science discipline.” We also wrote in the Discussion “Our simulation findings, like any other, may be limited to the scenarios considered by our simulation data. As a result, the findings cannot be applied to situations that have not been investigated.”

As advised, we have simplified a simplify a few complex technical details. However, most of the other technical terms are standard terms in the statistical literature whose meanings would change if further simplified.

Discussion

The discussion section effectively summarizes and interprets the results, relating them to the research questions. It compares different missing data imputation techniques, highlighting their strengths and weaknesses. However, the discussion could provide more insight into the practical implications of the findings for researchers using PLS-SEM. It would also benefit from a more comprehensive review of related literature to place the results in a broader context.

Response: Thank you, we have provided more insights into the practical implications of the findings. A more comprehensive review of related literature has also been done.

Conclusion

The conclusion summarizes the key findings concisely. It reiterates the importance of using regression imputation and EM imputation for addressing missing data in PLS-SEM. The conclusion could however be strengthened by discussing the real-world implications and practical recommendations resulting from the study.

Response: Thank you, we have now included the real-world implications and practical recommendations resulting from the study.

Decision Letter 1

Jibril Adewale Bamgbade

28 Dec 2023

An empirical comparison of some missing data treatments in PLS-SEM

PONE-D-23-24843R1

Dear Dr. Amusa,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jibril Adewale Bamgbade

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thanks for submitting the corrected manuscript. All the requested comments have been addressed appropriately.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

Acceptance letter

Jibril Adewale Bamgbade

10 Jan 2024

PONE-D-23-24843R1

PLOS ONE

Dear Dr. Amusa,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jibril Adewale Bamgbade

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. R scripts used for Monte Carlo simulations and data analyses.

    (ZIP)

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES