Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 28.
Published in final edited form as: Adv Methods Pract Psychol Sci. 2020 Feb 19;3(1):66–80. doi: 10.1177/2515245919885617

Improving Practices for Selecting a Subset of Important Predictors in Psychology: An Application to Predicting Pain

Sierra A Bainter 1,*, Thomas G McCaulley 2, Tor Wager 3, Elizabeth R Losin 1
PMCID: PMC8317830  NIHMSID: NIHMS1663647  PMID: 34327305

Abstract

Frequently, researchers in psychology are faced with the challenge of narrowing down a large set of predictors to a smaller subset. There are a variety of ways to do this, but commonly it is done by choosing predictors with the strongest bivariate correlations with the outcome. However, when predictors are correlated, bivariate relationships may not translate into multivariate relationships. Further, any attempts to control for multiple testing are likely to result in extremely low power. Here we introduce a Bayesian variable-selection procedure frequently used in other disciplines, stochastic search variable selection (SSVS). We apply this technique to choosing the best set of predictors of the perceived unpleasantness of an experimental pain stimulus from among a large group of sociocultural, psychological, and neurobiological (functional MRI) individual-difference measures. Using SSVS provides information about which variables predict the outcome, controlling for uncertainty in the other variables of the model. This approach yields new, useful information to guide the choice of relevant predictors. We have provided Web-based open-source software for performing SSVS and visualizing the results.

Keywords: neuroscience, Bayesian modeling, uncertainty, multiple regression, reproducibility, stochastic search variable selection, open data, open materials


Consider a challenge many researchers face in different contexts: given a set of candidate predictors, how do you decide which smaller set of predictors to include in a regression model? Ideally, the set of relevant predictors would be based on strong theory, the size of the sample would be generous, and the number of candidate predictors would be tiny in comparison with the sample size. In this ideal case, it may be simple to include all relevant predictors and report the results.

However, as the pool of potentially relevant predictors grows, the problem becomes thornier. Perhaps theory would predict that any of the predictors is related to the outcome or perhaps there is insufficient prior research to guide the choice. If all predictors are included at once, very few may be meaningfully related to the outcome while simultaneously competing for variance with all the others. In the extreme case, if the number of candidate predictors meets or even exceeds sample size, including all predictors is not possible. Alternatively, often the bivariate correlations between each predictor and the outcome are screened, and predictors with significant correlations are included in a multivariate model (e.g. Lavie et al., 1995, Safren et al., 2016, Schultz et al., 2004). This common practice has substantial weaknesses. When predictors are correlated, as they very often are in psychology, screening predictors based on bivariate correlations may easily translate into a multivariate model where no predictors have a meaningful, unique relationship with the outcome (i.e. partial regression coefficients are nonsignificant). Other relationships may not be detected without controlling for one or more confounding variables. Further, if any effort is made to guard against false positives by controlling for multiple testing, power to detect meaningful effects is greatly reduced.

In contrast, an algorithmic approach to selecting predictors may be appealing, such as stepwise regression or all possible subsets regression. In stepwise regression, a model is built by automatically adding and removing individual predictors, one step at a time, based on significance of individual terms or changes in R2. This produces results that capitalize highly on sampling error, have poor replicability, and do not correctly identify the best predictor set of a given size (e.g., Henderson & Denison, 1989; Thompson, 1995). Even though the use of stepwise regression has long been discouraged, it continues to be used (McNeish, 2015), perhaps in part because the method is an easily implemented solution to a vexing problem. All possible subsets regression is based on selecting a model with the highest R2 or other criterion such as AIC/BIC. Because the number of possible models increases exponentially with the number of predictors (e.g., 30 predictors means 230 = 1,073,741,824 possible models), methods to select the “best” model based on all possible subsets are also highly sensitive to chance variability and have poor generalizability (Olejnik, Mills, & Keselman, 2000).

In this paper we present a Bayesian variable selection method to aid researchers facing this common scenario, along with an online application to perform the analysis and visualize the results. The approach we demonstrate, stochastic search variable selection (SSVS; George & McCulloch, 1993, 1997) is not new, but to our knowledge it has been rarely applied in psychological research. SSVS1 is a popular method in more biologically based research, such as genome-wide association studies (Lee et al. 2003), but has been used in applications ranging from health outcomes (e.g., range of predictors for metabolic syndrome (Cheragi, Nedjat, & Mirmiran, 2018) to economic variables (e.g., economic and social factors associated with inflation (Doppelhofer & Miller, 2004). By accounting for model uncertainty, Bayesian variable selection methods both increase power and decrease false-positive results relative to traditional approaches (Viallefont, Raftery, & Richardson, 2001; Swartz, Yu, & Shete, 2008). SSVS provides information about the relative importance of predictors2, accounting for uncertainty in which other predictors are included in the model. In this tutorial, we present a common formulation of SSVS and compare to standard approaches in the context of an example from our own work when we faced the variable selection problem. Many other forms of Bayesian variable selection exist and we hope this tutorial will help spark interest in understanding which formulations perform best under which conditions (we recommend interested readers see Fragoso, Bertoli, & Louzada (2018) for a recent systematic review).

This paper is structured as follows. In the next section we introduce our motivating example, followed by a description of SSVS. We then demonstrate SSVS compared with standard variable selection based on bivariate relationships and introduce our online application for performing SSVS. We close with a discussion placing SSVS in context with other variable selection methods, including a simulation comparing SSVS with lasso regression (Tibshirani, 1996).

A Motivating Example

Although pain is a universal part of (neurotypical) human experience, the way people experience and respond to pain is highly variable. Individual differences in pain sensitivity likely originate from a variety of sources as we have outlined in our recently published neurocultural model of pain (Anderson & Losin, 2017). These sources range from genetic variation related to the endogenous pain modulatory system, to socially learned coping styles, to one’s previous experiences with pain and other stressful life events. Characterizing the contributions that these different psychological, sociocultural, and neurobiological factors make to individual differences in pain is of great interest to both basic scientists and clinicians. Yet, because of the myriad of potential factors contributing to individual differences in pain, the combination of factors that are most influential is still not well understood. SSVS is therefore an ideal approach to help tackle this problem.

Here we apply SSVS to a dataset in which participants underwent experimental pain induction using contact heat to the forearm while undergoing fMRI. During the pain stimulation we collected several brain measures, and after each heat stimulation participants rated the intensity and unpleasantness of the pain they experienced. For the present analysis we focus on the pain unpleasantness ratings as our dependent variable, which were averaged over nine trials (α = .91).

Prior to their lab visit, participants also completed a battery of self-report questionnaires online via Qualtrics. The measures selected were hypothesized to contribute to individual differences in pain rating and fell into the different stages of our neurocultural model of pain, including stressful life experiences, mood, beliefs about the causes and consequences of pain, responses to pain (e.g. coping strategies), and pain context (e.g. trust in the person performing the painful procedure). We also included measures of personality and demographics as those have also been demonstrated to influence pain responses (Anderson, Green, & Payne, 2009; Kim et al., 2017; Shiomi, 1978). The self-report measures corresponding to each category are listed in Table 1. Details of each questionnaire, references, and scoring can be found in the supplemental materials as well as tables with descriptive statistics for each measure (Tables S1S6).

Table 1.

Self-report questionnaires administered prior to experimental pain induction

Pain precursors (stressful life events) Barratt Simplified Measure of Social Status (BSMSS)
Life Events Checklist (LEC)
Williams Major and Everyday Discrimination Questions (WQ)
Brief History of Pain Questionnaire (BHPQ [3])
Alcohol, Smoking, and Substance Involvement Screening Test (ASSIST)
Experiences of Discrimination measure (EOD [5])
Stress and Adversity Inventory (STRAIN [6])
Pain precursors (mood) Positive and Negative Affect Schedule (PANAS [4])
State and Trait Anxiety Inventory form X (STAI)
The Mood and Anxiety Symptom Questionnaire (MASQ)
Ruminative Response Scale (RRS [3])
Five Facet Mindfulness Questionnaire (FFMQ [2])
Imaginal Process Inventory (IPI [3])
Pain precursors (beliefs & expectations) Pain Beliefs Questionnaire (PBQ [2])
Fear of Pain Questionnaire-III (FPQ [3])
Health and Illness Scale (HIS [7])
Pain responses Kohn Reactivity Scale (KRS)
Pain Catastrophizing Scale (PCS [3])
Coping Strategies Questionnaire 24 (CSQ [8])
Emotion Regulation Questionnaire (ERQ [2])
Pain context & communication Wake Forest Physician Trust Scale (WFPTS)
Trust Visual Analogue Scale (TVAS)
The Similarity Visual Analogue Scale (SVAS [2])
Personality and identity The Multigroup Ethnic Identity Measure (MEIM)
The Big Five Inventory (BFI [5])
Demographic variables Age
Sex
Race/Ethnicity

Note. Subscales of some measures were omitted to reduce participant burden. Numbers in [] denote the number of subscales (if > 1) included in the model selection. See supplemental materials for additional details of measures used including references, subscales, scoring, and descriptive statistics.

Finally, we included three measures of brain responses to pain in the present analysis. The first brain measure was each person’s expression of the Neurologic Pain Signature (NPS), a machine-learning-derived multivariate pain-predictive pattern of brain activity (Wager et al., 2013). The second set of brain measures was activity within two brain regions that lie outside the NPS but are associated with the affective-motivational aspects of pain, the nucleus accumbens (NAcc) and the ventromedial prefrontal cortex (vmPFC) (Baliki et al., 2012), and thus may be of particular relevance to predicting out outcome measure of pain unpleasantness (Price, 2000). See the supplementary materials section for a full description of each brain measure.

SSVS, like standard regression analysis, requires complete data for the dependent variable and all predictors. However, whereas a series of regression models can maximize available sample size on a model-by-model basis (i.e. pairwise deletion), we require complete cases for all measures. Out of an initial pool of 104 predictors for 93 participants, the rates of missingness were low for each predictor (mean missingness was 2%), but only 48 participants had complete data on all predictors. Therefore, in order to maximize both sample size and the candidate predictor set for this analysis, we excluded predictors with more than 6% missing responses and retained cases with complete data for these measures. In our example data set, we have 74 participants measured on our dependent variable (pain unpleasantness ratings) and 75 predictors of pain unpleasantness ratings.

Note that although in this example we have about as many variables as observations, SSVS is applicable to less extreme scenarios (i.e. 10 candidate predictors in a large sample) or more extreme. In fact, SSVS has been regularly applied in genome wide association studies (GWAS), with thousands of genes (Fridley, 2009; Lee, Sha, Dougherty, Vannucci, & Mallick, 1993), though some computational adjustments may be necessary when the number of predictors far exceeds sample size. Other useful applications may be to determine a subset of predictors that predict a behavior or diagnosis. For example, which set of predictors together best predict a measure of inflammation, hurricane evacuation, or willingness to get HIV testing?

Standard Approach

First we will consider the result if we select predictors based on the bivariate correlations with pain unpleasantness ratings. We may consider predictors with correlations that are significant at the .05 level or use a more liberal cutoff to prioritize discovery, for example .1. We may also adopt a more stringent cutoff or adjustment to account for multiple comparisons. In this example, 10 predictors have bivariate relationships with unpleasantness ratings with p-values less than .1, 8 are significant with p-values less than .05, and 3 have p-values less than .01. A Benjamini-Hochberg correction for multiple comparisons would result in 1 significant predictor. The choice of cutoff is arbitrary, however, in our experience we have observed more examples of researchers using a less stringent cutoff for the purpose of exploratory variable selection, so we adopt this approach here.

In the left panel of Table 1, we show the results of a regression model including the 10 predictors chosen based on the significance (α = .10) of their bivariate correlations. In the full model, three predictors remain significant: having filed a formal complaint of discrimination (EOD - subscale 5), pain catastrophizing during the scan (CSQ - subscale 5), and belief in supernatural causes of illness (HIS - subscale 7). The model R2 indicates that together the set of predictors explain 45.9% of the variance in pain ratings.

Fundamentally, each individual correlation coefficient tests a bivariate relationship and asks, “Is there a significant relationship between a given predictor xi and outcome y, ignoring all other factors?”. The multivariate regression model then asks, for each predictor, “Is there a significant relationship between xi and y, above and beyond all the other predictors in my model?”. Unfortunately, the bivariate tests may only select a set of predictors that are themselves highly correlated, and they may be redundant in a model together. Or, a predictor may be meaningfully related to the outcome only when controlling for another predictor(s). By contrast, SSVS can be used to ask a slightly different question, “is a given X a consistent predictor of Y, accounting for uncertainty in the other variables included in the model?” We introduce SSVS next.

Stochastic Search Variable Selection

Briefly, SSVS samples thousands of regression models. The models are selected from a sampler that is designed to select among “good” models (the meaning of good here will be explained shortly). After sampling, rather than selecting the best model according to a specified criterion (e.g., best AIC/BIC or highest model R2), we can examine the proportion of times each predictor was selected, which provides information about which predictors reliably predict pain ratings, accounting for uncertainty in the other predictors in the model.

To set up SSVS, we start with a standard regression model for y with some number of predictors p:

yi=β0+β1xi1+β2xi2+βpxip+εi.

Rather than obtaining parameter estimates using the method of ordinary least squares, we will obtain estimates using a Bayesian framework for the regression model. In the Bayesian framework, in addition to our regression model for the data, we specify prior distributions for each parameter. Prior distributions are simply expressions of our beliefs (and uncertainty) about the values of the parameters. Bayesian estimation combines these priors with the model and data to arrive at the posterior, which summarizes our updated beliefs and uncertainty about the parameters.

In this case, the prior for the intercept may be totally flat (e.g. Uniform[−∝,∝]) to indicate that we are providing no prior information about the intercept. For each regression coefficient, we may choose a prior that is normally-distributed, centered at zero, with large variance. For example, N(mean= 0, SD = 10) indicates we believe zero is the most likely value, and most coefficients should be less than 20 in absolute value. Because the model is more easily estimated in terms of the inverse residual variance or precision (1/σ2), a prior for this parameter may be ~ Gamma(a = .01,b = .01), which allows positive values in a wide range.

Simply by adding prior distributions to our regression model to make it Bayesian, we have not actually done anything to help solve the variable selection problem. The results from Bayesian estimation of this model would be essentially identical with OLS regression (given our priors are relatively uninformative). Even if we could estimate this model and get an estimate for every β, we really believe that most of the β’s are essentially zero; they can’t all be important for predicting pain unpleasantness ratings.

To reflect this, we add a set of binary indicator variables δj, one for each predictor, that “toggle” predictors in and out of the model.

yi=β0+δ1β1xi1+δ2β2xi2+δpβpxip+εi

If δj =1, variable j is included in the model. If δj = 0, variable j is not included (and we don’t estimate it’s βj). Stochastic search variable selection (SSVS; George & McCulloch, 1993) treats the full set of indicator variables δ as unknown parameters to be estimated. The estimation is trying to characterize our uncertainty about the most probable values for the coefficients β and the best set of predictors, δ.

We also include a prior probability for each δj; a common value is .5. so that each predictor has a 50/50 prior probability of being included the model. Choosing .5 as the prior inclusion probability seems reasonably “uninformative” (or objective), however note that in our example this actually implies prior belief of a model with .5*77= 38.5 predictors, which is still too large for our sample size. We will return to the issue of the prior inclusion probability later, in particular see Box 1. The prior for the β coefficients in SSVS is also called a “spike and slab” mixture, pictured in Figure 1, because the prior distribution for the coefficients is a mixture of a spike (shown in red) and a relatively flat “slab” of non-zero values (blue).

Sidebar 1. The effect of the prior inclusion probability.

We selected a prior probability for including each predictor of .5, but how much does this value impact the marginal inclusion probability, the values we obtain from the posterior? The figure below shows the marginal inclusion probabilities obtained for a range of prior inclusion probabilities, varying from .1 to .9 (the prior probability is equal for all predictors).

Sidebar 1.

There is a clear effect of the prior probability on the marginal (posterior) inclusion probability. However, the relative pattern across predictors is stable. For example, selecting the 10 largest marginal inclusion probabilities corresponds to the same predictor set whether the prior probability was .3 or .5. Choosing more extreme prior values, especially .9, results in less clear separation in MIPs among moderate values. In general, selection of a reasonable prior probability (given the sample size and number of predictors) has some effect on the magnitude of the MIPs, but does not have a strong influence on which predictors have the highest MIPs.

Figure 1.

Figure 1.

Spike and slab mixture prior distribution for regression coefficients

In order to estimate the model parameters, we use Markov-Chain Monte-Carlo (MCMC) estimation, which is a method used in Bayesian statistics to take samples from the posterior distribution, and from these samples we can characterize our uncertainty about probable values for our parameters, the coefficients to indicate which predictors to include in the model and their associated regression coefficients. This method for SSVS can be implemented in R, and we have provided an online application (SSVSforPsych) to easily perform the variable selection and visualize the results, which we will introduce in the next section.

For our example of predicting pain unpleasantness ratings, we run the MCMC sampler for 10,000 iterations. MCMC requires a “warm-up” period to help ensure the chain converges to the target distribution, and because of this we discard the first 1000 samples. Note that it is important to standardize the explanatory variables before performing variable selection, so the priors (which have a fixed scale) aren’t differentially influencing predictors that are on different scales. After SSVS, it is fine to use whatever scale is easiest for interpretation. At each iteration estimates of inclusion indicators, regression coefficients, and (inverse) variance are obtained. Figure 2 shows the result of a single iteration of the sampler. Estimates of the regression coefficients (y-axis) are shown for each of the 77 predictors (x-axis). The predictors that were not included in the model for this particular iteration (δj = 0) have a coefficient value of zero.

Figure 2.

Figure 2.

Estimated beta coefficients from an example iteration of SSVS. Coefficients of zero indicate predictors not selected into the model in this sample iteration. Numerical suffix indicates subscales included in variable selection, see supplemental materials for full details on each measure.

In 10,000 iterations of the sampler, only a tiny fraction of the possible models will be visited, and we cannot hope to visit all models and choose the single best. Fortunately, our goal is to sample many high probability models, and identify the predictors that appear most often. This information is summarized in Figure 3, where we have plotted the marginal inclusion probability for each predictor in blue (MIP), which is the proportion of times each predictor was included in the sampled models. Predictors selected in 50% or more of the models are shown with a triangle, other predictors are shown with a dot. It is interesting to compare the inclusion probabilities for each predictor with the magnitude of their correlation with pain unpleasantness rating, plotted in red in Figure 3. Although some of the high correlations correspond to high inclusion probabilities, the relationship between bivariate correlations and MIPs is imperfect. In this case, they are correlated r(75)=.52, p<.001).

Figure 3.

Figure 3.

Marginal inclusion probabilities shown in blue after SSVS (prior inclusion probability = 0.5) and correlations are shown in red. Triangles indicate variables selected by each method depending on threshold. Numerical suffix indicates subscales included in variable selection, see supplemental materials for full details on each measure.

To assess MCMC convergence for SSVS, we must compare results across two or more chains to ensure the results are stable (Swartz et al., 2013). To do this, we ran SSVS a second time and computed the correlation between the estimated MIPs for each parameter across the two chains, which was above .99. Unstable results across chains would suggest that more iterations are needed for MCMC convergence, and the number of burn-in iterations should also be increased.

In this example we found 20 predictors with MIPs .5 or greater. Although we can choose to include all of these predictors in a model together, this is not a firm cutoff or recommended threshold, and this is still too large a predictor set for meaningful interpretation. Visualizing the inclusion probabilities as in Figure 3 allows us to examine the pattern of relative magnitudes. A major reason why we should not rely on a specific cutoff for the MIPs is that they will vary depending on the sample size, total number of predictors and the prior probability of inclusion, however the pattern of relative MIPs (i.e. which predictors have the highest MIPs) should remain substantially stable. This is illustrated further in Box 1.

Here, we choose to select the predictors with the 10 highest MIPS, which also gives us a model with the same number of predictors as the model based on bivariate correlations. In the second panel of Table 2, we show the results of a standard regression model, where we have included the 10 predictors with the highest MIPs. This subset of 10 predictors chosen using SSVS (2nd panel) has some overlap with the predictor set chosen via the bivariate selection model: 6 predictors were chosen using both model building strategies. However, note that 6 of the 10 predictors in the SSVS selection model significantly predict unpleasantness ratings. The three additional significant predictors in the SSVS model, relative to the bivariate selection model, are belief that exposure causes illness (HIS - subscale 3), the mean activity in the vmPFC during pain, and global perceptions of racial discrimination (EOD - subscale 4). This model has an R2 value of .59 (Adjusted R2 = .53) and explains 13% more of the variability in pain unpleasantness ratings compared to the alternate model with the same number of predictors.

Table 2.

Regression results predicting ratings of unpleasantness following model selection based on bivariate relationships versus SSVS

Predictor Bivariate Selection Model SSVS Model
r b 95% CI MIP b 95% CI
Stressful events
Less worry about discrimination (EOD[3]) .24* 0.41 [−1.46, 2.28] .74 −1.08 [−3.04,0.88]
Higher global perceived discrimination (EOD[4]) .03 .86 2.91 [−5.22, 0.61]
Have not filed discrimination Complaint (EOD[5]) .36* 13.60 [2.85, 24.36] .84 15.25 [−24.66,5.83]
Mood
Daydreaming frequency (IPI[2]) .21 0.11 [−.14, 0.36] .35
Pain beliefs
Fear of minor pain (FPQ[1]) .05 .72 −0.42 [−0.87, 0.03]
Fear of medical pain (FPQ[3]) .23* 0.10 [−0.34, 0.54] .65
Exposure causes of illness (HIS[3]) .11 .93 2.55 [0.88, 4.21]
Supernatural causes of illness (HIS[7]) .21 2.55 [0.24, 4.87] .85 2.68 [0.61, 4.75]
Pain responses
Pain catastrophizing during scan (CSQ[5]) .44* 0.72 [0.21, 1.23] .99 1.08 [0.71, 1.44]
Cognitive coping during scan (CSQ[8]) −.31* −0.37 [−0.82, 0.09] .57
Pain context
Trust in experimenter (TVAS) −.24* −0.03 [−0.21, 0.15] .47
Brain responses
Mean vmPFC activity −.15 .67 5.13 [−8.90, −1.36]
Demos.
Hispanic −.28* −5.48 [−12.06,1.11] .74 −5.48 [−11.04, 0.07]
Black .28* −0.53 [−9.15, 8.10] .68 5.21 [−1.98, 12.40]
R2 =.459, Adj R2 = .373 R2 = .593, Adj R2 = .529

Note. Numbers in [] denote subscale number (if > 1 included), see supplemental materials for more information for all measures. b represents unstandardized regression weights. r represents the zero-order correlation. MIP represents the marginal inclusion probability.

*

indicates correlation with p < .05, bold coefficients are significant in regression models with p < .05.

Online Application for SSVS

One barrier to the use of SSVS analyses in psychological data is the lack of an easy-to-use tool with which to conduct such analyses. Some R packages offer functionality for SSVS analyses, such as BoomSpikeSlab (Scott, 2016). However, these packages can be confusing, difficult to implement, or require extensive R coding experience, all of which may put off researchers from using them. Using the Shiny package in R (Chang, Cheng, Allaire, Xie, & McPherson, 2017), we have created a free web application called SSVSforPsych (https://ssvsforpsych.shinyapps.io/ssvsforpsych/), a user-friendly web application outfitted with a GUI interface that relieves the need for complicated coding. The application code builds on R code for SSVS (Reich, 2014). By introducing this shiny application, we hope SSVS can become an easily available tool, even for users who are unfamiliar with R.

After navigating to the SSVSforPsych webpage, users can upload their data in a variety of common file formats (e.g., comma-delimited, text, excel, SPSS). If predictors are unstandardized, the user can choose for the program to perform standardization. Users then have the option to toggle the parameter values of several inputs in the SSVS analysis, including the prior inclusion probability, the number of warm-up iterations, and the number of samples. The application defaults to a prior inclusion probability of 0.5, 10,000 samples, and 1,000 warm-up (discarded) iterations. After selecting parameter values, users select the set of predictors and outcome they wish to analyze. Additionally, users have the option to select random starting values, which should be used when assessing MCMC convergence, as discussed in the previous section.

After the sampling concludes, SSVSforPsych generates a table of the results and a plot displaying the MIPs (similar to Figure 3) using the ggplot2 graphics package (Wickham, 2016). Tables and plots can be downloaded as .csv spreadsheets and .png plots. SSVSforPsych automatically updates results, tables, and graphs in real-time when the user submits changes to any of the inputs. For example, if the prior probability changes from 0.5 to 0.3, SSVSforPsych will rerun the analysis to reflect the new estimated values, allowing users to analyze their data under various statistical assumptions.

SSVSforPsych is not designed to handle missing data, and attempts to run analyses with missingness in the predictors or outcomes may result in error messages or misleading results. We advise users to upload data without missing values for any variables to be used in the SSVS. However, SSVS in the presence of missing data is an important direction for future research.

Comparison with Other Methods

Many other approaches to reducing a candidate set of predictors exist, these include ridge regression, the lasso (Tibshirani, 1996), the elastic net lasso (Zou & Hastie, 2005), various machine learning approaches (see e.g. Hastie, Tibshirani, & Friedman, 2016), and principal components analysis (PCA). In general, machine learning approaches focus on prediction rather than interpretation of the predictor set. Principal components analysis likewise performs data reduction, but interpretation of the components as predictors is not appropriate. Dominance analysis (Budescu, 1993) is another method used to evaluate the relative importance of predictors within a regression model; this method is used to rank a smaller set of predictors within a defined model and not for variable selection. Lasso methods are most similar, in that they can be used to make binary decisions about inclusion or exclusion of predictors. In fact, a connection between lasso and some SSVS formulations has recently been demonstrated under some conditions (Yuan & Lin, 2005). However, whereas SSVS provides continuous information about each predictor variable’s importance, the lasso only provides binary decisions about inclusion. This continuous information is helpful because, as in our example, researchers may decide to balance parsimony with prediction.

Whereas OLS regression simply minimizes the sum of squared residuals (SS), the lasso adds an additional term (the sum of the absolute values of all coefficients), which penalizes coefficients to avoid overfitting, with the result that some coefficients are set to zero.

lasso=SS+λj=1P|βj|

The parameter λ controls the strength of the penalty, and typically an algorithm is used to obtain the λ value that minimizes cross validation error. Researchers have shown that SSVS outperforms lasso techniques in certain conditions, especially in problems investigating many predictors and when the number of predictors is larger than sample size, such as GWAS studies (Srivasta & Chen, 2009; Guan & Stephens, 2011).

However, because the literature comparing Bayesian estimation to lasso is sparse and disjointed, and to better demonstrate a comparison of SSVS to lasso, here we summarize results of a simulation study based on our example in which we varied sample size, n=75 and n=150, and the reliability of the outcome: perfect reliability (α=1), moderate (α=.8), and low (α=.5). For each of the 6 conditions we simulated 500 data sets, using the covariance matrix from our example as the population generating model, with correlations between the predictors and outcome adjusted to account for attenuated or disattenuated relationships depending on the reliability of the outcome. We based our design on Braun, Converse, & Oswald (2019), and our data and code for data generation and analysis is provided at https://osf.io/m8djx/. For each replication, we recorded the predictors selected using lasso regression and SSVS with a cutoff MIP=.5. Lasso analyses were conducted using the cv.glmnet() function from the glmnet package (Friedman, Hastie, and Tibshirani, 2010). We used 10 folds for cross-validation based on McNeish (2015), which provides guidelines for conducting lasso analyses with psychological and behavioral data.

The average number of predictors selected by each method are summarized in Table 3. In all conditions the lasso selected more predictors than SSVS, but the pattern of effects across conditions was similar. For both SSVS and lasso, more predictors were selected for the larger sample size and higher reliability. Interestingly, the predictors included by SSVS were largely a subset of the lasso predictors: 91% – 99% of the predictors selected by lasso were also selected by SSVS, whereas 17% – 43% of the predictors chosen by lasso were also chosen by SSVS.

Table 3.

Average number of predictors selected by SSVS and lasso for each condition and the likelihood of a predictor being selected by one method given it was selected by the other

Avg. Model Size (# predictors) If SSVS, then lasso If lasso, then SSVS
SSVS lasso
N=150 α =1 28.6 66.1 .99 .43
α =.8 13.3 52.0 .97 .25
α =.5 5.9 34.2 .98 .18
N=75 α =l 12.9 47.7 .97 .26
α =.8 5.9 35.4 .97 .17
α =.5 3.3 18.2 .91 .20

To understand the stability of which predictors were selected, and how this was impacted by sample size and reliability of the outcome, in Figure 4 we plot the proportion of replications each of the 75 predictors was selected. Results for SSVS and lasso are in the top and bottom rows, respectively. Beginning with SSVS, the pileup near zero shows that most predictors were selected infrequently, however some predictors were selected often. The number of predictors that were selected frequently increased as sample size and reliability increased. For example, in the most favorable condition with perfect reliability and n=150, 12 predictors were selected in >90% of replications and 38/75 predictors were selected <25%. By comparison, for n=75, only 2 predictors were selected in >90% of replications, and most predictors (59/75) were selected <25%. The bottom panels of Figure 4 show the corresponding plots for lasso. The condition with perfect reliability and n=150 resulted in most predictors (60/75) being selected for >90% of replications. With perfect reliability but n=75, 23/75 were selected >90%.

Figure 4.

Figure 4.

The effect of sample size and reliability on frequency each predictor was selected. SSVS results are shown in the top panels; Lasso is on the bottom panels.

Overall, our simulation supports previous work showing that lasso and SSVS are related (Yuan & Lin, 2005). In the conditions studied, lasso selected a larger set of predictors compared to SSVS, and SSVS generally selected a smaller subset of what lasso selected. Different population models are likely to lead to different results. In our example the predictors identified from our theoretical model are moderately correlated. The elastic net (Zou & Hastie, 2005) is designed to perform well when predictors are highly correlated, however in our experience (results not reported here), elastic net resulted in very few predictors selected (0, 1, or 2). Again, we believe that SSVS is useful for this research problem because it provides researchers with continuous information about predictor importance for variable selection. We also found evidence for stability in the predictors chosen across models, and evidence that fewer predictors were selected with lower n and reliability.

Discussion & Conclusion

Variable selection problems frequently arise in psychology, and we have endeavored to show that SSVS is a much more appropriate and useful method than selecting predictors based on the size or significance of bivariate correlations. As our example demonstrates, because SSVS selects predictors while accounting for uncertainty in which other variables are included in the model, variables selected by this approach are more likely to reliably predict the outcome.

In our example of selecting a set of predictors of pain unpleasantness, both the bivariate correlation and SSVS approach resulted in significant predictors in the model related to stressful life events (discrimination), beliefs about pain and illness, and responses to pain (catastrophizing). However, in addition to the SSVS approach explaining more (13%) unique variance in pain unpleasantness than the bivariate correlation approach, the SSVS model included a significant neurobiological predictor of pain unpleasantness, mean activity in the vmPFC, whereas the bivariate correlation model did not include any neurobiological predictors. As the vmPFC is a brain region that has been particularly implicated in encoding pain unpleasantness (Price, 2000), this result is consistent with the pain literature and adds to our understanding of relationships between sociocultural and neurobiological mechanisms underlying the affective aspects of the pain experience.

The SSVS formulation we have presented here is one of a large family of related Bayesian variable selection approaches. This approach may become computationally intractable as the number of predictors or sample size increases or when the correlations among predictors are high (Yuan & Lin, 2005). It will be important to characterize these limits for research in psychology and identify extensions where needed. Extensions have been developed for different large scale problems with many thousands of predictors (Fridley, 2009; Ishwaran & Rao, 2005) We hope that future simulation experiments will reveal generally where these limits lie in terms of sample size and numbers of predictors, for conditions representative of psychological data.

Finally, although SSVS is a useful approach to provide new information about the relative importance of predictors for variable selection, it is a data-driven tool. If sufficient information is available to narrow down the set of predictors a priori, a more theory-driven approach should be preferred. In our example, all of the predictors were collected to predict pain because they made sense from a theoretical standpoint. This exploratory approach may also be especially useful for hypothesis generating or analysis with pilot data and then followed up with a new primary data collection. We believe the information provided is useful, not for automated decision-making, but for thoughtfully condensing the predictor set.

Supplementary Material

1

Sidebar 2. Two goals: Regularization and variable selection.

There is a very important distinction between lasso regression and SSVS as we are presenting here: lasso is usually done with regularization (i.e. shrinks coefficients) in the same step as variable selection. Similarly, SSVS can be performed in a single step including both variable selection and regularization by examining the distribution of estimates over all MCMC samples (including any iterations where βj=0). Combining regularization with variable selection has the effect of minimizing out-of-sample prediction error. Because we are demonstrating variable selection to narrow down a set of predictors indicated by theory, we chose to perform variable selection and estimation in different steps: 1) variable selection and 2) obtain model estimates using standard OLS estimation, therefore our estimates are not regularized. This two-stage approach is common for applications of SSVS (e.g. Swartz et al., 2006; 2013), likewise it can be used for lasso and is referred to as post-Lasso estimation (Belloni & Chernozhukov, 2013; Efron et al., 2004).

Footnotes

Disclosures

Data, materials, and online resources:

The scripts for the simulation, plots, and applied analyses in this article are available online at https://osf.io/m8djx/ along with documentation.

1

Bayesian variable selection and SSVS can refer to a large family of closely related variable selection approaches (e.g. Bayesian model averaging, Gibbs variable selection, spike-and-slab regression, reversible-jump MCMC), a full review of which is outside the scope of this tutorial. The formulation presented here is commonly used and compares favorably with alternative formulations (see Lahiri, 2001, O’Hara & Sillanpaa, 2009).

2

Note that methods for assessing variable importance are used both for variable selection and for determining the importance of a set of predictors within a given model. In this paper we focus on variable selection, but see Grömping (2015) for a recent review of variable importance metrics within a given model.

References

  1. Anderson KO, Green CR, & Payne R (2009). Racial and ethnic disparities in pain: causes and consequences of unequal care. Journal of Pain, 10(12), 1187–1204. doi: 10.1016/j.jpain.2009.10.002 [DOI] [PubMed] [Google Scholar]
  2. Anderson SR, & Losin EAR (2017). A sociocultural neuroscience approach to pain. Culture and Brain, 5(1), 14–35. [Google Scholar]
  3. Baliki MN, Petre B, Torbey S, Herrmann KM, Huang L, Schnitzer TJ, … Apkarian AV (2012). Corticostriatal functional connectivity predicts transition to chronic back pain. Nature Neuroscience, 15(8), 1117–1119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Belloni A, & Chernozhukov V (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli, 19(2), 521–547. 10.3150/11-BEJ410 [DOI] [Google Scholar]
  5. Braun MT, Converse PD, & Oswald FL (2019). The accuracy of dominance analysis as a metric to assess relative importance: The joint impact of sampling error variance and measurement unreliability. Journal of Applied Psychology, 104(4), 593–602. 10.1037/apl0000361 [DOI] [PubMed] [Google Scholar]
  6. Budescu DV (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114(3), 542–551. 10.1037/0033-2909.114.3.542 [DOI] [Google Scholar]
  7. Chang W, Cheng J, Allaire JJ, Xie Y, McPherson J, RStudio, … R), R. C. T. (tar implementation from. (2018). shiny: Web Application Framework for R (Version 1.2.0). Retrieved from https://CRAN.R-project.org/package=shiny
  8. Tibshirani R, Johnstone I, Hastie T, & Efron B (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499. 10.1214/009053604000000067 [DOI] [Google Scholar]
  9. Fragoso TM, Bertoli W, & Louzada F (2018). Bayesian Model Averaging: A Systematic Review and Conceptual Classification. International Statistical Review, 86(1), 1–28. 10.1111/insr.12243 [DOI] [Google Scholar]
  10. Fridley BL (2009). Bayesian variable and model selection methods for genetic association studies. Genetic Epidemiology, 33(1), 27–37. 10.1002/gepi.20353 [DOI] [PubMed] [Google Scholar]
  11. Friedman J, Hastie T, &Tibshirani R (2010). Package ‘glmnet’. Journal of Statistical Software, 33, 1. [PMC free article] [PubMed] [Google Scholar]
  12. George EI, & McCulloch RE (1993). Variable Selection Via Gibbs Sampling, 10.
  13. George EI, & McCulloch RE (1997). Approaches for Bayesian Variable Selection, 35.
  14. Grömping U (2015). Variable importance in regression models. Wiley Interdisciplinary Reviews: Computational Statistics, 7(2), 137–152. 10.1002/wics.1346 [DOI] [Google Scholar]
  15. Guan Y, & Stephens M (2011). Bayesian variable selection regression for genome-wide association studies and other large-scale problems. The Annals of Applied Statistics, 5(3), 1780–1815. [Google Scholar]
  16. Hastie T, Tibshirani R, & Friedman JH (2009). The elements of statistical learning: data mining, inference, and prediction (2nd ed). New York, NY: Springer. [Google Scholar]
  17. Henderson DA, & Denison DR (1989). Stepwise Regression in Social and Psychological Research. Psychological Reports, 64(1), 251–257. doi: 10.2466/pr0.1989.64.1.251 [DOI] [Google Scholar]
  18. Ishwaran H, & Rao JS (2005). Spike and Slab Gene Selection for Multigroup Microarray Data. Journal of the American Statistical Association, 100(471), 764–780. 10.1198/016214505000000051 [DOI] [Google Scholar]
  19. Kim HJ, Yang GS, Greenspan JD, Downton KD, Griffith KA, Renn CL, … Dorsey SG (2017). Racial and ethnic differences in experimental pain sensitivity: systematic review and meta-analysis. Pain, 158(2), 194–211. doi: 10.1097/j.pain.0000000000000731 [DOI] [PubMed] [Google Scholar]
  20. Lahiri P (Ed.). (2001). Model selection. Beachwood, Ohio: Institute of Mathematical Statistics. [Google Scholar]
  21. Lavie P, Herer P, Peled R, Berger I, Yoffe N, Zomer J, & Rubin A-HE (1995). Mortality in Sleep Apnea Patients: A Multivariate Analysis of Risk Factors. Sleep, 18(3), 149–157. 10.1093/sleep/18.3.149 [DOI] [PubMed] [Google Scholar]
  22. Lee KE, Sha N, Dougherty ER, Vannucci M, & Mallick BK (1993). Gene selection: a Bayesian variable selection approach, 8. [DOI] [PubMed]
  23. Lu Z-H, Chow S-M, & Loken E (2016). Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences. Multivariate Behavioral Research, 51(4), 519–539. 10.1080/00273171.2016.1168279 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Mavridis D, & Ntzoufras I (2014). Stochastic search item selection for factor analytic models. British Journal of Mathematical and Statistical Psychology, 67(2), 284–303. 10.1111/bmsp.12019 [DOI] [PubMed] [Google Scholar]
  25. McNeish DM (2015). Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences. Multivariate Behavioral Research, 50(5), 471–484. 10.1080/00273171.2015.1036965 [DOI] [PubMed] [Google Scholar]
  26. Olejnik S, Mills J, & Keselman H (2000). Using Wherry’s adjusted R2 and Mallow’s Cp for model selection from all possible regressions. Journal of Experimental Education, 68(4), 365–380. 10.1080/00220970009600643 [DOI] [Google Scholar]
  27. Price DD (2000). Psychological and neural mechanisms of the affective dimension of pain. Science, 288(5472), 1769–1772. [DOI] [PubMed] [Google Scholar]
  28. R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. [Google Scholar]
  29. Reich B (2014). R code for a stochastic search variable selection. Retrieved from https://www4.stat.ncsu.edu/~reich/ST740/code/SSVS_BMA.R
  30. Rouder JN, & Morey RD (2012). Default Bayes Factors for Model Selection in Regression. Multivariate Behavioral Research, 47(6), 877–903. 10.1080/00273171.2012.734737 [DOI] [PubMed] [Google Scholar]
  31. Safren SA, Hughes JP, Mimiaga MJ, Moore AT, Friedman RK, Srithanaviboonchai K, …. (2016). Frequency and predictors of estimated HIV transmissions and bacterial STI acquisition among HIV-positive patients in HIV care across three continents. Journal of the International AIDS Society, 19(1), 21096. 10.7448/IAS.19.1.21096 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Schultz ZI, Crook J, Meloche RG, Berkowitz J, Milner R, Zuberbier AO, & Meloche W (2004). Psychosocial factors predictive of occupational low back disability: towards development of a return-to-work model: Pain, 107(1), 77–85. 10.1016/j.pain.2003.09.019 [DOI] [PubMed] [Google Scholar]
  33. Scott SL (2018). BoomSpikeSlab: MCMC for Spike and Slab Regression (Version 1.0.0). Retrieved from https://CRAN.R-project.org/package=BoomSpikeSlab
  34. Shiomi K (1978). Relations of pain threshold and pain tolerance in cold water with scores on Maudsley Personality Inventory and Manifest Anxiety Scale. Perceptual and motor skills, 47(3_suppl), 1155–1158. [DOI] [PubMed] [Google Scholar]
  35. Srivastava S, & Chen L (2009). Comparison between the stochastic search variable selection and the least absolute shrinkage and selection operator for genome-wide association studies of rheumatoid arthritis. BMC Proceedings, 3(S7). 10.1186/1753-6561-3-S7-S21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Swartz MD, Kimmel M, Mueller P, & Amos CI (2006). Stochastic Search Gene Suggestion: A Bayesian Hierarchical Model for Gene Mapping. Biometrics, 62(2), 495–503. [DOI] [PubMed] [Google Scholar]
  37. Swartz MD, Peterson CB, Lupo PJ, Wu X, Forman MR, Spitz MR, … Shete S (2013). Investigating Multiple Candidate Genes and Nutrients in the Folate Metabolism Pathway to Detect Genetic and Nutritional Risk Factors for Lung Cancer. PLoS ONE, 8(1), e53475. 10.1371/journal.pone.0053475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Swartz MD, Yu RK, & Shete S (2008). Finding factors influencing risk: Comparing Bayesian stochastic search and standard variable selection methods applied to logistic regression models of cases and controls. Statistics in Medicine, 27(29), 6158–6174. 10.1002/sim.3434 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Thompson B (1995). Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply here: A Guidelines Editorial. Educational and Psychological Measurement, 55(4), 525–534. 10.1177/0013164495055004001 [DOI] [Google Scholar]
  40. Tibshirani R (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. [Google Scholar]
  41. Viallefont V, Raftery AE, & Richardson S (2001). Variable selection and Bayesian model averaging in case-control studies. Statistics in Medicine, 20(21), 3215–3230. 10.1002/sim.976 [DOI] [PubMed] [Google Scholar]
  42. Wager TD, Atlas LY, Lindquist MA, Roy M, Woo C-W, & Kross E (2013). An fMRI-based neurologic signature of physical pain. New England Journal of Medicine, 368(15), 1388–1397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Wickham H (2016). ggplot2: elegant graphics for data analysis (Second edition). Cham: Springer. [Google Scholar]
  44. Yuan M, & Lin Y (2005). Efficient Empirical Bayes Variable Selection and Estimation in Linear Models. Journal of the American Statistical Association, 100(472), 1215–1225. 10.1198/016214505000000367 [DOI] [Google Scholar]
  45. Zou H, & Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. 10.1111/j.1467-9868.2005.00503.x [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES