Abstract
Scientists studying culture typically focus on a small number of theoretical constructs, such as individualism–collectivism, when seeking to explain cultural differences in psychological tendencies and behaviors. However, existing theories of culture could have missed out on important constructs that are useful for explaining cross-cultural differences. We used an abductive approach combining prediction and explanation to uncover important cultural values. In the prediction phase, based on 594 attitudes, values, and beliefs included in the World Values Survey, a neural network could classify respondents' nationalities with 90% accuracy in out-of-sample data. In the explanation phase, a feature importance analysis identified the values that contributed the most to predicting individuals' countries of origin. The top 60 variables resulting from this analysis were used to create the machine learning-based cultural values inventory (ML-CVI), a tool to help future researchers uncover explanations for cross-cultural differences. Four follow-up studies demonstrated ML-CVI's theoretical and practical relevance. Specifically, Americans were less likely than Mexicans to comply with COVID-19 lockdowns, and this difference was explained by Americans' stronger Christian nationalism. Moreover, Indians were more likely than Americans to engage in proenvironmental behavior, and this difference was driven by Indians' stronger perseverance. Thus, the ML-CVI broadens the range of explanatory factors available to researchers by helping them identify explanations for cultural differences that they would not have been able to identify based on traditional theories of cultural values. Overall, this research highlights that machine learning-based abductive reasoning can help expand the range of explanatory frameworks in social science research.
Keywords: machine learning, culture, discovery, prediction, explanation
Significance Statement.
When explaining cultural differences in psychological tendencies or behaviors, culture scientists typically focus on a narrow range of values, such as individualism–collectivism. This study found that a broader set of values, many of which are not covered in existing frameworks of cultural values, is needed to accurately distinguish people from nearly 100 countries. This set of values can help researchers explain cultural differences in psychological tendencies and behaviors when existing culture scales fail to do so. Thus, social scientists and practitioners studying culture can fruitfully consider a broader range of constructs as potential explanatory variables. More broadly, this research demonstrates that machine learning can be simultaneously used for prediction and explanation in the social sciences.
What values best distinguish the world's cultures? The machine learning-based cultural values inventory
What attitudes, values, and beliefs best differentiate the world's national cultures (culturesa, in short)? This is one of the fundamental questions that scientists studying culture have grappled with for the past 50 years. Prominent theories of culture have posited a small number of constructs, such as Hofstede's (1) four dimensions (i.e. individualism–collectivism, power distance, uncertainty avoidance, and masculinity–femininity); Markus and Kitayama's (2) distinction between independent and interdependent self-construals; Triandis's (3) distinction between horizontal and vertical individualism–collectivism; Schwartz et al.'s (4) universal cultural values framework; Inglehart and Baker's (5) traditional versus secular-rational and survival versus self-expression values; and Triandis's (3) and Gelfand et al.'s (6) distinction between tight and loose cultures.
Although highly influential, a narrow range of theoretical constructs can be limiting because social behavior is multiply determined. Cultural variations often arise from a complex web of attitudes, values, and beliefs (7). Although the current social science practice of focusing on a narrow range of theoretical constructs provides elegant explanations (8), it may fail to capture a substantial part of the set of basic constructs that underlie cultural differences in psychological processes and behaviors. Addressing challenges such as omitted variable bias (9), our focus lies in advancing the epistemology of cross-cultural research. Specifically, we test whether a set of culture-related constructs identified in a theory-blind manner using machine learning methods can provide alternate explanations for cultural differences in important outcomes compared to those provided by traditional dimensions of culture, thereby revealing explanations for cultural differences that researchers might not have investigated otherwise.b
Our machine learning-based approach follows in the footsteps of recent research in psychology that has used computational social science approaches for discovery (10, 11). More broadly, scholars have argued that human thought is a bottleneck in scientific progress—if researchers do not conceive of a solution or explanatory variables, then there is no way those relationships would be tested (12, 13). On a similar note, if existing theories in a field were complete, there would be no need to use machine learning to generate hypotheses—all hypotheses of any significance could be generated from existing theories. Yet theories in the social sciences are incomplete and imprecise, so seeking to derive hypotheses solely from existing theories is likely to miss out on many important cause-and-effect relationships (14).
Machine learning models are not limited by human thought or existing theories. As long as researchers provide these models with a wide range of potential predictors, machine learning models would learn the underlying relationships that help predict the outcome variable based on the input variables; researchers can then identify variables that play an important role in the model's predictions, use this information to generate hypotheses, and test these hypotheses using traditional methods. For example, biomedical researchers trained a machine learning model to learn which of a large number of chemicals would neutralize a large number of bacteria; they then presented the model with a bacterium that is resistant to all known antibiotics and asked it to predict which of the available chemicals would best neutralize the bacteria; finally, they tested the suggested chemical in the lab and found that it successfully neutralized the antibiotic-resistant bacteria (15). The process of using machine learning for generating hypotheses or discovering solutions represents a form of abductive reasoning (i.e. “inference to the best explanation; 16; see 17)”, which deviates from the traditional methods of deductive and inductive reasoning.
In psychology, personality psychologists have used lasso regressions, which sort out relevant from irrelevant predictors, to predict individuals' characteristics (18). For example, using publicly available data, researchers have identified individuals' demographic characteristics with very high accuracy (e.g. 19) and their personality characteristics with reasonably high accuracy (e.g. 20). Researchers have also used random forest models to predict romantic desire (21). However, this body of work has largely focused on prediction, that is, whether machine learning models can make accurate judgments, which is the raison d'être of machine learning (8). Researchers have also used a subset of WVS items (11, 22) and variation in the topics of Facebook posts across various regions (23) to compute measures of cultural distance; notably, between-country distances generated from Facebook align with between-country distances generated from the World Values Survey (WVS) (23). However, this work has focused on constructing measures of cultural distance rather than on identifying a set of values that maximally distinguishes various cultures.
Social psychologists are typically more interested in explanation than prediction. Researchers can train a machine learning model to categorize people or predict a continuous outcome variable, then use feature importance analysis to identify the variables that most affect the model's categorization or prediction. For example, researchers trained a neural network to classify people as ethical or unethical and found that optimism about humanity's future was one of the top predictors of unethicality per a feature importance analysis on the model (24). They then verified that increasing optimism made people act more ethically. Similarly, participants' attitudes, values, and beliefs were used to train a neural network to classify the survey period to identify cultural change markers. A feature importance analysis demonstrated that prosociality, political perspectives, and Protestant work ethic were diagnostic of seven periods spanning four decades (25).
In the current research, we trained a neural network to identify individuals' countries of origin based on their attitudes, values, and beliefs (values, in short). If people's values are diagnostic of their specific country of origin, the model should have high accuracy. Moving from prediction to explanation, we used a feature importance analysis to reveal the values that contributed the most to the model's predictions. It is possible that many of these values are consistent with existing models of culture, but it is also possible that many of these values are not covered in existing models of culture, which would suggest new directions for research and theorizing in the science of culture. We next used the explanations generated by the machine learning model to create a general-purpose cultural values inventory. As the inventory is generated agnostic of specific outcome variables, it could be used by social scientists to uncover explanations for cultural differences in outcomes that cannot be satisfactorily explained by existing theories of culture.
Overview of studies
Study 1 used a machine learning model to identify which of the nearly 600 values measured in the WVS best differentiate people from nearly 100 countries. The top 60 values formed the machine learning-based cultural values inventory (ML-CVI). Next, study 2 tested its predictive ability and verified whether this inventory distinguishes people from six different countries better than existing scales. The next four studies (studies 3a, 3b, 4a, and 4b) sought to provide a blueprint for how cultural psychologists can use this inventory to uncover explanations for cultural differences in important behavior, explanations that may be overlooked if their investigations were limited to existing models of culture. First, studies 3a and 3b assessed whether items from the ML-CVI can explain United States–Mexico differences in compliance with COVID-19 social distancing orders above and beyond commonly used culture scales. Next, studies 4a and 4b tested whether items from the ML-CVI can explain cultural differences in proenvironmental behaviors across the United States and India, again above and beyond commonly used culture scales. In sum, the goal of study 1 was to apply a machine learning approach to identify a subset of the 600 WVS items that are most useful for cross-cultural analyses, consistent with standard practices in the field of data science. In contrast, the aim of studies 3a–4b was to examine whether individual items from the ML-CVI could serve as mediators for cross-cultural differences, following standard methods in the field of cross-cultural psychology (e.g. (26)). Study 2 served as a bridge between the two. Our overall procedure is illustrated in Fig. 1.
Fig. 1.
Flowchart depicting the overall procedure employed in this article.
We report how we determined our sample size, all data exclusions, all manipulations, and all measures. The WVS data are available at www.worldvaluessurvey.org; the machine learning code, primary data, materials, and syntax are available at https://osf.io/fmduv. Any additional variables and analyses are reported in Supplementary Information. This research was approved by the Institutional Review Boards of the Hong Kong Polytechnic University (protocol # HSEARS20220506001) and Nanyang Technological University (protocol #IRB-2015-07-018). All participants provided informed consent.
Study 1: machine learning model
In study 1, we used the WVS dataset, the largest survey of individuals' attitudes, values, beliefs, perceptions, evaluations, behaviors, and demographics across a wide range of countries. This WVS dataset has been widely used by cross-cultural psychologists, sociologists, economists, and political scientists (e.g. 23, 27–32). Traditionally, researchers have focused on only a few variables collected in the WVS, in one or a few waves, and in one or a few countries. However, with machine learning, we could analyze the entire dataset.
We used all the data available at the time of the analysis, that is, the merged waves 1 to 6 dataset (33), which contained 348,532 individuals' responses to 860 attitudes, values, beliefs, and behaviors. The survey has been conducted in 98 countries across six waves from 1981 to 2014, so the findings from this study can be generalized to a substantial part of the world. As long as an item was measured in any country in any of the six waves, it was included in the dataset. Although representative adult population samples were surveyed in each country/territory in each wave, the same individuals were not sampled repeatedly across waves (34). As of now, the WVS has added a seventh wave of data; however, studies 1, 2, and 3a were already completed before the wave 7 data were released.
We trained a neural network to classify individuals' nationalities based on their attitudes, values, and beliefs. We decided to construct the ML-CVI using a neural network because of the large number of potential predictors (nearly 600) and observations (over 300,000); neural networks often outperform other machine learning methods (e.g. random forests and support vector machines) when applied to large datasets (35). Although classification is not the primary goal of culture scientists, we reasoned that if a particular value helps distinguish people across cultures, then it is likely (but not guaranteed) to also explain cultural variation in outcome variables that vary across cultures; if a value does not help distinguish people across cultures, then it simply cannot explain cultural differences in outcomes of interest. Indeed, the implied goal of Hofstede's and Inglehart and Baker's dimensions was to identify values that vary across cultures. We make this goal explicit by using a classification model while acknowledging that classification is merely a stepping stone to identifying values that differentiate cultures the most and, thus, are good candidates for explaining cultural differences in outcomes of interest.
Method
Figure 1 illustrates our overall method. We followed a series of steps to clean the data for the analysis, such as excluding variables that did not contain participants' responses (e.g. administrative variables, factor scores), variables that contained textual data, and variables with more than 20 categorical response options. As our focus was on identifying culture-general attitudes, values, and beliefs, we excluded some variables that were either culture-specific (e.g. attitudes about specific ethnicities), time-specific (e.g. attitudes about the current state of the economy), behaviors (e.g. whether the respondent engaged in political action), and other variables that were not attitudes, values, or beliefs (e.g. what are the respondent's sources of information). We one-hot coded all categorical variables (i.e. created a new variable for each categorical response option). As people's values change over time (25), we included the WVS wave as a predictor in the machine learning model (please see SI, p. 21, for additional analyses with and without including WVS wave as a predictor). The final dataset for model building consisted of responses from 98 countries, with a median sample size of 2,843 responses and 594 variables (see Fig. S1 for the sample size by country). The file containing these 594 predictors (“Study 1_All WVS Predictors.xlsx”) is uploaded to the project's OSF repository.
We used the holdout technique to test the model: data from 90% of the respondents were used to build the model (i.e. seen data), and that from the remaining 10% were reserved for testing the model (i.e. unseen or test or out-of-sample data). As different questions were asked in different country-wave combinations (see Tables S1, S2, and S4), 66.86% of the values were missing in the overall dataset. We could have eliminated country-wave-item combinations with structurally missing values, but such an exercise would be susceptible to researcher degrees of freedom. Thus, we sought to minimize researcher degrees of freedom and use as much of the data as possible using machine learning-based imputation methods that are well-established in the wider scientific literature. Specifically, we imputed all missing values using a random forest-based imputation procedure implemented in the R package missRanger (36, 37), which “intrinsically constitutes a multiple imputation scheme” (38). This method learns the valid range of each variable in the unimputed data and respects each variable's range in the imputed data. Importantly, it has been shown to outperform other commonly used imputation methods when applied to data that is missing not at random (39, 40). This random forest method has been extensively used to impute similar volumes of missing information in the WVS in recent research (24, 41, 42; see Table S3 for our imputation parameters). As the WVS items were measured on different response scales (i.e. on 2-, 3-, 4-, 5-, 6-, 7-, and 11-point scales, see Table S4), we scaled each item to range from 0 to 1, as prior research has shown that this procedure helps the model converge faster (43).
We first scaled and imputed values in the seen data. We then appended the unseen data to the seen data, reconducted the imputation and scaling procedures, and deleted the seen data. As the past informs the future, we can use the seen data to impute the unseen data; our procedure ensured that no information leaked from the unseen data into the seen data. We built a fully connected feedforward multilayer perceptron with an input layer, an output layer, and two hidden layers (see Fig. 2). The number of hidden layers was determined by trial and error.
Fig. 2.
Illustration of the final neural network. Note. The green-colored circles indicate the standardized 594 predictor variables derived from the WVS dataset. The blue- and yellow-colored circles indicate the intermediate layers of neurones. The final row depicts the total neurons in each layer of the neural network. The orange-colored circles indicate the 98 countries that each respondent could belong to.
As we obtained high accuracy with two hidden layers, we did not add a third layer, which would significantly increase the computation time. The model was trained to classify each respondent's country of origin based on 594 attitudes, values, and beliefs. Once the model's classification accuracy reached an asymptote across successive iterations, the model was frozen. Table S5 presents the parameters of the final neural network.
Results
Once the model was trained, we presented it with the unseen data while masking the respondents' country of origin. We asked the model to predict which of the 98 countries each respondent in the unseen data belonged to, based on their values. Chance accuracy would be 1/98 = 1.02%. However, the model identified which of the 98 countries participants belonged to with 89.70% accuracy. Importantly, this accuracy metric is immune from overfitting because it was assessed on the unseen data to which the model was not exposed during the training phase. An overfitted model would have high accuracy on the training data but low accuracy on unseen data. The model's multiclass area under the curve (AUC) was 91.88% (please see SI, p.4, for a detailed explanation).
Next, to identify values that contributed the most to our model's predictions, we conducted a permutation-based feature importance analysis. This analysis, performed on the seen data, sought to identify which of the 594 WVS variables contributed the most to distinguishing people from the 98 countries. The analysis shuffled the values of each of the 594 predictors one at a time across all observations and assessed the extent to which the model's loss value changed due to this permutation. Permuting more important variables to the model's classification would lead to a bigger change in the model's loss value. The top 400 variables identified through the feature importance analysis are uploaded to the OSF data repository (see “Study1_top_predictors.xlsx” on OSF).
To determine the number of items to include in the ML-CVI, we computed the multiclass AUC of the model's classification in the unseen data. We gradually increased the number of top-ranked items to be included for computing the AUC from 1 to 10 in steps of 1, and then from 10 to 100 in steps of 5 (see Fig. S3). As there is no sharp increase in the AUC when the number of items increases above 60, and as the longest scale that we compare the ML-CVI against has 57 items (4), we decided to include the top 60 items in the ML-CVI. The mean values of the 60 items included in the ML-CVI for each country are available on the project's OSF repository (see “Study 1_MLCVI Values_by_country.xlsx”). We also plotted a dendrogram (44) to interpret the cultural clusters mapped by the 60 items of the ML-CVI (see Fig. S7).
Table 1 reports reworded versions of the top 60 original WVS items (so that all items are in a standardized format; see Table S4 for original items). The ML-CVI contained several themes that overlapped with those covered in existing culture scales (e.g. politics environment family interpersonal; 3, 4, 5, 45, 46; see SI, p. 5, for coding details) and several themes that are not evident in existing cross-cultural scales, such as the relationship between the government and society, gender roles, marriage, and family, among others (file “Study 1_MLCVI Values_by_country.xlsx” on the project's OSF repository tabulates the mean values for the 60 ML-CVI items by country).
Table 1.
Reworded versions of the top 60 WVS items included in the ML-CVI.
| Rank | WVS variable | Reworded item | Overlap with existing scales |
|---|---|---|---|
| 1 | e119 (1) | To what extent do you think maintaining order in society is the most important responsibility of the government? | 4 |
| 2 | d033 | To have a successful marriage, how important is it that spouses agree on politics? | |
| 3 | b023 | How serious of a problem is pollution of rivers, lakes, and oceans in the world? | 4 |
| 4 | a025 (1) | To what extent do you think that regardless of what the qualities and faults of one's parents are, one must always love and respect them? | 3, 44 |
| 5 | e038 | Do you think people who are unemployed should have to take any job available, or should have the right to refuse a job they do not want it? | |
| 6 | e034 (2) | To what extent do you think that society must be gradually improved by reforms? | |
| 7 | d073 | In your opinion, how important is it for women to work outside home? | |
| 8 | e066 | Do you think your country should aim to be an egalitarian society where the gap between rich and poor is small regardless of individuals' achievement, or a competitive society where wealth is distributed according to ones' achievement? | 3, 43 |
| 9 | e008 | To what extent do you think that people should be given more say in important government decisions | |
| 10 | f103 | To what extent do you think that religious leaders should not influence how people vote in elections? | |
| 11 | a169 (1) | How important is it for you to understand others' preferences? | |
| 12 | a012 | During the past few weeks, how often do you feel proud because someone complimented you? | 44 |
| 13 | f192 (3) | If you feel sad and want to talk to someone, how likely are you to turn to your spouse/significant other? | |
| 14 | c060 (2) | When it comes to how business and industry should be managed, to what extent do you agree that owners and employees should participate in selecting the managers? | |
| 15 | g007_34_b | To what extent do you trust people you meet for the first time? | 5 |
| 16 | a045 (3) | To what extent is determination the most important thing for a child to learn at home? | 5 |
| 17 | f036 | To what extent do you think that the religious institutions in your country are giving adequate answers to the problems of family life? | |
| 18 | f114_01 | Can stealing someone else's property ever be justifiable? | 5 |
| 19 | d066_b | If a woman earns more money than her husband, how likely is it to cause problems? | |
| 20 | g002 (1) | To what extent do you feel that you belong to the locality where you are currently living? | |
| 21 | a026 (1) | To what extent do you think that the parents' duty is to do their best for their children even at the expense of their own well-being? | 3, 44 |
| 22 | e019 | If in the near future, there is more emphasis on your family life, is it a good thing or a bad thing? | |
| 23 | f035 | To what extent do you think that the religious institutions in your country are giving adequate answers to moral problems and needs of the individual? | |
| 24 | a045 (2) | To what extent is obedience the most important thing for a child to learn at home? | 5 |
| 25 | e246 | To what extent do you think that your country's leaders should give high priority to the goal of making a significant improvement in the housing of people living in slums? | |
| 26 | f105 | To what extent do you think that religious leaders should influence government? | |
| 27 | e206 | To what extent do you believe that free and fair elections will reduce terrorism? | 4 |
| 28 | c009 (3) | If you were looking for a job, how important is to you to work with people whom you like? | 43 |
| 29 | d070 | In your opinion, is being religious an important trait in a woman? | |
| 30 | e061 | To what extent do you agree that political reform in your country is moving too rapidly? | |
| 31 | e062 (3) | Do you think that goods made in other countries can be imported and sold here if people want to buy them, or that there should be stricter limits on selling foreign goods here to protect the jobs of people in this country? | |
| 32 | d063_b | To what extent do you think that having a job is the best way for a woman to be an independent person? | |
| 33 | a018 | During the past few weeks, to what extent did you feel that things were going your way? | |
| 34 | e215 | To what extent do you think that it is necessary to fight terrorism by military means? | |
| 35 | e238 (2) | To what extent do you think that discrimination against girls is the most serious problem of the world? | |
| 36 | d080 | To what extent do you decide your goals in life by yourself? | 4 |
| 37 | e046 | Do you think that ideas that have stood test of time are better, or that new ideas are better? | |
| 38 | e005 (3) | What counts more, ideas or money? | |
| 39 | f124 | How justifiable is getting intoxicated? | 5 |
| 40 | i002 | How important is it for you to know about science in your daily life? | |
| 41 | e069_51 | How much confidence do you have in religious leaders? | |
| 42 | i001 | To what extent do you think that one of the bad effects of science is that it breaks down people's ideas of right and wrong? | |
| 43 | a124_02 | Would you like or dislike having people of a different race as neighbors? | |
| 44 | d037 | For a successful marriage, how important is sharing household chores? | |
| 45 | a025 (2) | To what extent do you think that one has the duty to respect and love parents who have not earned it by their behavior and attitudes? | 3, 44 |
| 46 | f192 (2) | If you feel sad and want to talk to someone, how likely are you to turn to friends? | |
| 47 | a010 | During the past few weeks, to what extent did you feel particularly excited or interested in something? | |
| 48 | e034 (1) | To what extent do you think that society must be radically changed? | |
| 49 | a199 | How important is it to you to do something for the good of society? | |
| 50 | d006 | How important is it for you to share political views with your spouse/partner/significant other? | |
| 51 | c016 | When it comes to aspects of a job, how important is it to have an opportunity to initiate things independently? | |
| 52 | e067 | Do you think your country should be a society with extensive social welfare but high taxes, or a society with low taxes in which individuals take responsibility for themselves? | |
| 53 | a124_43 | Would you like or dislike people who speak a different language as neighbors? | |
| 54 | a045 (1) | To what extent is thrift the most important thing for a child to learn at home? | 5 |
| 55 | e214 | To what extent do you think that Western democracy is the best political system for your country? | |
| 56 | d013 | How important is it for you to share political attitudes with your parents? | |
| 57 | b022 | How serious of a problem is the loss of plant or animal species or biodiversity? | 4 |
| 58 | d068 | To what extent do you think that being a good mother is an important trait in a woman? | |
| 59 | e034 (3) | To what extent do you think that society must be valiantly defended? | |
| 60 | d010 | How important is it for you to share attitudes about religion with your parents? |
Numbers in parentheses indicate the response option of one-hot coded items.
Discussion
When presented with the values of a new set of individuals, our neural network could identify which of 98 countries they belonged to with 90% accuracy. This finding indicates that people's values contain unique signatures of their country of origin. Upon examining the top values per the permutation-based feature analysis, we found several values consistent with existing models of cultural values. However, we also found several themes not covered in models of culture, such as the relationship between the government and society, and others.
As the organizers of WVS asked different questions in different countries and waves, we had to impute a significant proportion of the data. As imputation is inherently noisy, it would likely make it more difficult for our neural network to differentiate people from various countries. Importantly, if the values identified from the neural network trained on this imperfect, imputed dataset continue to differentiate people from different countries with high accuracy in external studies, it would indicate that the model was able to uncover important cultural values despite the many limitations of the WVS. Thus, the imperfections in the WVS were a key motivation for conducting the subsequent follow-up studies to validate the utility and generalizability of the ML-CVI.
Study 2: comparing the ML-CVI against five commonly used culture scales
In this study, we administered our 60-item ML-CVI along with five commonly used and highly cited cross-cultural scales in six countries and tested whether the ML-CVI distinguishes people from these countries better than commonly used scales. Although the other scales may or may not have been designed specifically to distinguish people from countries, we submit that logically, if a scale cannot differentiate countries (e.g. it has identical scores for a set of countries), then it also cannot explain cultural variation in outcomes among those countries. To be usable, any culture scale needs to be able to distinguish people from different countries.c
Method
To avoid introducing noise due to translation, we sampled six countries in which English is commonly used in education and business. We collected data from 764 respondents (101 from Australia, 103 from India, 190 from Singapore, 103 from South Africa, 66 from the UK, and 201 from the United States) using Amazon Mechanical Turk, Cloud Research, Qualtrics Panels, and a Singapore university-based participant pool. Only participants who passed an attention check were allowed to proceed to the survey (see the survey uploaded to the project's OSF data repository for the exact questions; see Table S6 for the demographic details of the valid participants).
Participants were asked to respond to the ML-CVI, PVQ5X value inventory (19 subscales; 4), vertical/horizontal–individualism/collectivism scale (four subscales; 3), Hofstede’s value survey module (six subscales; 45), interdependent–independent scale (two subscales 46), and tightness–looseness scale (6). We used a 7-point response scale for all items (see “Study 2_Survey.pdf” on OSF for the response scale anchors for each item).
We tested the extent to which the items belonging to each scale could differentiate people from the six countries using random forest models. To ensure an apples-to-apples comparison, we compared the five traditional scales with an equivalent number of top-ranked items from the ML-CVI. We switched from neural networks to random forests in this study because the number of predictors was dramatically lower (594 in study 1 vs. 57 or fewer in study 2), which substantially reduces the computational time needed to train a random forest model. Importantly, switching the method allows us to verify whether an inventory developed using one model (i.e. a neural network) can still differentiate countries when tested using a different model (i.e. a random forest). To obtain a CI for the model's accuracy, we performed this analysis 30 times on different subsets of the data using a resampling technique designed to compare multiple models built on the same data (using 10-fold cross-validation repeated three times; 47; see SI p.19 for details).
Results
Figure 3 presents the results. As expected, the top 57 items of the ML-CVI classified people's countries more accurately than the 57-item PVQ5X value inventory (4); the top 28 items from the ML-CVI more accurately than the 28-item vertical–horizontal individualism–collectivism scale (3), the top 24 items from the ML-CVI more accurately than the 24-item interdependence–independence scales (46) and 24-item Hofstede Value Survey Module (45), and the top six items from the ML-CVI more accurately than the six-item tightness–looseness scale (6). Thus, study 2 found that the ML-CVI is better at distinguishing people from different countries compared to other commonly used culture scales. Although classification is not a goal of existing cultural value inventories, it is a natural outcome of the manner in which existing cultural scales are designed—they were designed to capture differences across cultures. A cultural dimension with similar values across many different countries would be unable to explain cultural differences in outcomes across those countries.
Fig. 3.
Dot-plot depicting the accuracy with which the five preexisting cross-cultural scales and the ML-CVI identified which of the six countries respondents belonged to (study 2). Note. Error bars depict the 95% CI of the value across the 30 runs.
To verify whether ML-CVI items show greater between-country variance than other culture scales, we computed the means for each of the 60 ML-CVI items and the existing culture scales across the six countries sampled in this study. As Table S7 (pp. 21–22) indicates, the variation across countries is larger for most ML-CVI items than for existing cross-cultural scales. This suggests that the ML-CVI may be better able to capture more cross-cultural differences than existing culture scales.
Study 3a: explaining cultural differences in COVID-19 lockdown violations
The next set of studies assessed whether the ML-CVI can generate explanations for cross-cultural differences in behavior. We tested whether ML-CVI explains cultural differences in a consequential behavior—the extent to which people followed lockdown orders during the COVID-19 pandemic—above and beyond existing scales. We compared the United States and Mexico in this study because, given their geographical proximity, the COVID-19 pandemic arrived in these two countries at about the same time. Yet, the two countries are culturally distinct (e.g. 48).
The following studies were designed to bridge the gap between machine learning and the OLS-based methods that are commonly used in the field to test underlying mechanisms. Thus, we used OLS-based methods to assess mediation in this study. We note that the model's performance in study 2 was based on the collective predictive power of all 60 items, including main effects and interactions and linear and nonlinear effects. As such, there is no guarantee that individual ML-CVI items would serve as linear mediators for cultural differences in outcomes.
Method
In July 2020, we posted surveys seeking 100 participants each from the United States and Mexico on Prolific Academic, which would give us 80% power to detect Cohen's d = 0.40 with α = 0.05 (two-tailed). In response, 99 US residents (53 women, 43 men, and 3 nonbinary; Mage = 35.35 years) and 102 Mexico residents (35 women, 66 men, and 1 nonbinary; Mage = 27.56 years) completed the survey. No participants were excluded. We ran the survey in English in both countries as, in our experience, most Mexican participants on Prolific are fluent in English.
To measure the extent to which participants stayed at home during COVID-19 lockdowns in their locality, we asked them: “Think about the time when you were living under a lockdown, that is, when people were prohibited from leaving their home except for essential items (e.g. food and medicine).” We then administered six items (e.g. “During the lockdown, how often did you leave your home to relieve your boredom?”) on a 7-point scale with response options ranging from “Never” to “Multiple times a day” (αUS = 0.87; αMexico = 0.77) Thereafter, we administered the five cross-cultural scales used in study 2, along with the ML-CVId.
Results
Mexicans indicated that they were less likely to violate stay-at-home orders, M = 1.67, SD = 0.64, 95% CI [1.54, 1.79], than Americans, M = 2.14, SD = 1.04, 95% CI [1.93, 2.34]; t(199) = 3.87, P < 0.001, Cohen's d = 0.55, 95% CI [0.26, 0.83].
Our first step was to assess the explanatory power of the existing culture scales vis-à-vis the ML-CVI. We first tested whether scale or subscale averages for each of the five existing culture scales would mediate the United States–Mexico difference identified above. As ML-CVI does not have a clear factor structure, we tested whether each of the 60 items would mediate the United States–Mexico difference. In total, we conducted 92 mediation analyses (60 items of ML-CVI, 19 Schwartz values (4), 6 Hofstede dimensions (45), 4 Triandis subscales (3), 2 Singelis subscales (46), and the Gelfand (6) tightness–looseness scale) with country as the independent measure, violation of COVID-19 stay-at-home orders as the dependent measure, and the respective item, subscale average, or scale average as the mediator. As the PROCESS indirect effect analysis does not provide P-values for mediation effects, we conducted the traditional Sobel test for mediation for this purpose.
To minimize the risk of spurious findings, we used the Benjamini and Hochberg (49) procedure to ensure that the overall false discovery is capped at 1% (P ≤ 0.01). We input the P-values for all the individual mediation analyses in a false discovery rate (FDR) tool (https://tools.carbocation.com/FDR), which revealed only one significant predictor: ML-CVI item 26 (“To what extent do you think that religious leaders should influence government?”). None of the other ML-CVI items or any of the other existing culture scales or subscales met this threshold. We then proceeded to test the indirect effect through ML-CVI item 26 using PROCESS (50). A bootstrapped analysis with 20,000 samples revealed a significant indirect effect of culture on people's willingness to adhere to COVID-19 lockdown restrictions through ML-CVI item 26 (ab = −0.25, SE = 0.08, 95% CI [−0.44, −0.11]), which is consistent with mediation. Specifically, we found that Mexican participants (M = 1.42, SD = 1.08, 95% CI [1.21, 1.63]) were less likely to agree that religious leaders should influence government than Americans (M = 2.60, SD = 1.89, 95% CI [2.22, 2.97]; t(199) = 5.41, P < 0.001, d = 0.76, 95% CI [0.48, 1.05]), and this explained their reduced violation of COVID-19 stay-at-home orders compared to Americans.
Furthermore, given that Covid lockdown adherence may be influenced by demographic factors, we tested whether the mediation relationship holds after controlling for participants' age, gender, education, social class, household income, number of members in the household, the density of the town or locality they were in at the time of the survey, whether the individual's occupation was deemed essential and, thus, required them to step out of the home during Covid lockdowns, and their political ideology. The indirect effect through ML-CVI item 26 was significant even after controlling for these variables (ab = −0.24, SE = 0.10, 95% CI [−0.46, −0.08]; see SI, p. 26, for additional details). In sum, ML-CVI item 26 mediated United States–Mexico differences in self-reported preventive behavior more strongly than several existing culture scales.
Study 3b: verifying ML-CVI-based explanation of cultural differences in COVID-19 lockdown violations
As study 3a conducted an exploratory analysis, study 3b sought to conduct a confirmatory test of the hypothesis generated from study 3a using a multi-item measure. The content of the item identified as a mediator in study 3a is close to the construct Christian nationalism, which refers to people's preference for religion as a driving force for governmental policies and advocates a fusion of civic life with Christian identity and culture (51). Although the ML-CVI item can apply to any religion, over 80% of Mexicans and 70% of Americans identify as Christians (52, 53); thus, we reasoned that if Mexicans and Americans believe that religion should have a role in government, they refer to the role of Christianity rather than that of other religions. Study 3b tested whether Christian nationalism mediates the United States–Mexico difference in residents' self-reported willingness to violate Covid-19 stay-at-home orders.
At first glance, it may seem odd that Christian nationalism explains differences in Covid-19 lockdown adherence. However, in the initial waves of Covid-19, before vaccines were widely available, many religious individuals and groups protested state governments' ban on religious gatherings, and the US Supreme Court agreed with them, prioritizing religion over public health (54). More Americans than Mexicans likely used religion to circumvent Covid-19 social distancing orders, which may explain, in part, Americans' greater willingness to violate social distancing orders compared to Mexicans'. Importantly, this analysis could only be uncovered through ML-CVI, not through other existing culture scales, as the relationship between religion and government does not feature in other common culture scales.
Method
This study was conducted in March 2023. We expected a smaller effect size in this study to account for the delay between the time of Covid-19 lockdowns and the time this study was conducted. Thus, for the power analysis, we assumed d = 0.27, half the effect found in study 3a (d = 0.55). A power analysis using this effect size, α = 0.05 (two-tailed), and power = 90% yielded a sample size of 580. We rounded up and posted surveys seeking 300 participants each from the United States and Mexico on Prolific. In response, 303 US residents (148 women, 152 men, and 3 nonbinary; Mage = 44.24 years) and 302 Mexico residents (147 women, 151 men, and 4 nonbinary; Mage = 28.28 years) completed the study.
Participants first responded to the outcome measure used in study 3a, assessing their adherence to Covid-19 lockdown orders. They then responded to the 5-item Christian nationalism measuree (e.g. “The federal government should advocate Christian values;” 51) using a 5-point scale ranging from “strongly disagree (1)” to “strongly agree (4),” with a fifth option labeled “undecided” (following 56). Finally, as part of the demographics, we asked participants to indicate their political ideology using two items (“strongly liberal” to “strongly conservative” and “strongly left” to “strongly right”f), both on 7-point bipolar scales.
Results
We averaged the six items used to measure the extent to which participants violated lockdown orders (αUS = 0.85; αMexico = 0.81), which formed the dependent measure. Replicating the results in study 3a, an independent samples t-test found that Mexicans reported being less likely to violate stay-at-home orders, M = 1.84, SD = 0.74, 95% CI [1.75, 1.92], than Americans, M = 2.08, SD = 1.01, 95% CI [1.96, 2.19]; t(603) = 3.33, P < 0.001, Cohen's d = 0.27, 95% CI [0.11, 0.43]. As expected, the effect size was lower than in study 3a, which was conducted in close proximity to Covid-19 lockdowns, possibly because there is greater noise in people's recollections from over a year ago than from the past few weeks.
For each of the Christian nationalism items, if participants selected “undecided,” we treated their response as a missing value. Overall, only nine participants (out of 605) selected “undecided” on any of the five items of the scale. We averaged the items of the scale to form the mediator measure (αUS = 0.92; αMexico = 0.79). Replicating the results from study 3a, Mexican participants (M = 1.57, SD = 0.70, 95% CI [1.49, 1.65]) were less likely to agree that the church should influence the government than American participants (M = 2.12, SD = 1.08, 95% CI [1.99, 2.24]; t(594) = 7.29, P < 0.001, Cohen's d = 0.60, 95% CI [0.43, 0.76]).
Next, we conducted an indirect effect analysis using PROCESS Model 4 (20000 bootstrap samples; 50) with the country (United States = 0, Mexico = 1) as the independent measure, violation of Covid-19 stay-at-home orders as the dependent measure, and Christian nationalism as the mediator. This analysis revealed a significant negative indirect effect of culture on greater violation of Covid-19 lockdown instructions through lower endorsement of Christian nationalism, ab = −0.11, SE = 0.03, 95% CI [−0.17, −0.06]. The obtained significant results can be predicted if the assumptions of a mediation model are correct. Importantly, the indirect effect remained statistically significant after controlling for sociodemographic factors (e.g. age, gender, household income, number of members in the household, whether the individual's occupation was deemed essential, etc., all included together in the bootstrapped mediation analysis; ab = −0.09, SE = 0.04, 95% CI [−0.17, −0.03]; see SI, p. 29, for detailed results).
At first glance, it may seem as if Christian nationalism may simply be a proxy for political ideology. However, Christian nationalism and political conservatism were only moderately correlated (r = 0.49, 95% CI [0.42, 0.54], P < 0.001); the five Christian nationalism items and the two political ideology items loaded on two distinct factors, and the indirect effect through Christian nationalism remained significant (ab = −0.11, SE = 0.03, 95% CI [−0.18, −0.05]) even when political ideology was controlled for (b = −0.0022, SE = 0.03, 95% CI [−0.06, 0.05]; see SI, p. 29, for details).
Discussion
In study 3a, we found that among five prominent existing culture scales and 60 items of the ML-CVI, only ML-CVI item 26 was able to explain the difference between Americans' and Mexicans' behaviors in following Covid-19 lockdown orders. Study 3b used these findings to uncover Americans' stronger belief, compared to Mexicans, that religion should play a role in the government as an explanation for why Americans reported violating Covid-19 social distancing orders more than Mexicans. As Covid-19 lockdown orders varied by state in the United States, with some states having strict directives and some states not enforcing a strict lockdown, we conducted hierarchical regressions with participants nested within states; the results were virtually identical (see SI, p. 27, 30, for the results).
We acknowledge that adherence to Covid-19 lockdowns is influenced by a wide range of within- and between-country ecological and structural variables. This is true for virtually all cross-cultural research that compares people across countries that differ on several dimensions apart from the cultural values that the researcher is interested in. However, if adherence to lockdowns was primarily driven by ecological and structural factors, we would be unable to detect any relationship between individuals' values and lockdown-related behaviors. Although Covid lockdown behaviors are multiply determined and noisy, the ML-CVI was able to explain these differences by uncovering Christian nationalism as a key explanation (a construct not covered by existing culture scales). We submit that if the ML-CVI could not capture explanations of cultural differences, it would have been unlikely to yield an elegant theoretical explanation for cultural differences in people's adherence to Covid lockdown orders that was verified by a follow-up study.
Study 4a: explaining cultural differences in proenvironmental behavior
This study was designed to further test the ML-CVI's utility in explaining cross-cultural differences in another consequential outcome of interest—proenvironmental behavior.
We posted surveys seeking 100 participants each from the United States and India on CloudResearch and Besample, respectively, which would give us 80% power to detect Cohen's d = 0.40 with α = 0.05 (two-tailed). In response, 103 US residents (48 women, 54 men, and 1 unreported; Mage = 46.39 years) and 105 Indian residents (52 women, 52 men, and 1 unreported; Mage = 31.88 years) completed the survey. We ran the survey in English in both countries, as English is commonly used in India (see SI, pp. 28–29, for additional analysis similar to studies 3a and 3b).
To measure the extent to which participants engaged in proenvironmental behavior, we administered a 6-item scale from (57) (e.g. “How often do you turn off the lights when leaving a room?”) on a 5-point scale from “Never” to “Always” (please see SI, p. 31, for all items). These measures reflect the environment-first strategy (58), which argues that “environmentally significant behavior can reasonably be defined by its impact: the extent to which it changes the availability of materials or energy from the environment” (59). However, these behaviors can also help people save electricity, and, therefore, costs. Thereafter, similar to study 3a, we administered the five cross-cultural scales, along with the ML-CVI. The only difference was that we used the validated scales from (60) to measure Hofstede's dimensions.
Results
We averaged the six items used to measure the extent of proenvironmental behaviors (αUS = 0.74; αIndia = 0.68). Indians indicated that they engaged in proenvironmental behavior more often, M = 4.09, SD = 0.64, 95% CI [3.97, 4.21], than Americans, M = 3.68, SD = 0.68, 95% CI [3.55, 3.81]; t(206) = 4.47, P < 0.001, Cohen's d = 0.62, 95% CI [0.34, 0.90].
Similar to study 3a, we tested whether scale or subscale averages for each of the five existing culture scales or the 60 items of the ML-CVI would mediate the United States–India difference identified above. In total, we conducted 91g Sobel test-based mediation analyses with the country as the independent measure, proenvironmental behavior as the dependent measure, and the respective item, subscale average, or scale average as the mediator.
Similar to study 3a, to avoid significant results simply due to chance, we input the P-values for all 91 individual mediation analyses (60 items of ML-CVI, and the other culture scales and subscales) in the FDR tool (49). An FDR of 1% (equivalent to P ≤ 0.01) revealed only one significant predictor: ML-CVI item 16 (“To what extent is determination the most important thing for a child to learn at home?”). We then proceeded to test the indirect effect through ML-CVI item 16 using PROCESS (50). A bootstrapped analysis with 10,000 samples revealed a significant indirect effect of culture (United States = 0, India = 1) on people's proenvironmental behaviors through ML-CVI item 16 (ab = 0.21, SE = 0.06, 95% CI [0.10, 0.32]), which is consistent with mediation. Specifically, we found that Indian participants (M = 5.68, SD = 1.37, 95% CI [5.42, 5.95]) were more likely to believe that determination is the most important quality for a child to learn than Americans (M = 4.42, SD = 1.53, 95% CI [4.11, 4.71]; t(205) = 6.27, P < 0.001, Cohen's d = 0.87, 95% CI [0.59, 1.16]), and this explained their greater proenvironmental behavior than Americans. Similar to the previous studies, controlling for sociodemographic factors (i.e. age, gender, education, social classh) did not affect the results in any meaningful way (ab = 0.18, SE = 0.06, 95% CI [0.08, 0.30]); see SI, p. 31 for details).
Study 4b: verifying ML-CVI-based explanation of cultural differences in proenvironmental behavior
Similar to study 3b, this study provided a confirmatory test of the mediator identified in study 3a using a multi-item scale. We found that the content of the mediator item was close to the construct of perseverance captured by the grit scale (61), which refers to an individual's ability to sustain effort in the face of adversity (e.g. “I have overcome setbacks to conquer an important challenge”).
Pilot study
A pilot study with 200 US residents (112 women, 87 men, 1 nonbinary; Mage = 47.4 years) validated that perseverance was a suitable construct to assess that determination is an important quality (see SI, p. 33, for details). Past research has found that the perseverance dimension of grit is more strongly associated with achievement and subjective well-being in Asian academic contexts than in Western contexts (62, 63). Furthermore, past correlational research has found that perseverance predicts proenvironmental awareness and behaviors (64). Building on this work, study 4b tested whether perseverance mediates the United States–India difference in self-reported proenvironmental behaviors.
Method
This study was preregistered at https://osf.io/xwg52. As preregistered, we first posted the study for 200 participants each from the United States and India on MTurk and Besample, respectively. In response, 200 US residents and 207 Indian residents completed the study. As preregistered, we excluded 13 participants from the United States and 60 participants from India who wrote gibberish or irrelevant responses to an open-ended question asking them to describe the task they had just completed. This exclusion led to a valid sample size of 187 in the United States and 147 in India. However, we had preregistered that if the valid sample size in a country is below 170, we will extend the data collection. As preregistered, we recruited additional participants in India to achieve the required sample size, resulting in a final sample of 187 participants from the United States (90 women, 95 men, 1 nonbinary, and 1 unreported; Mage = 45.22 years) and 169 from India (49 women, 118 men, 1 nonbinary, and 1 unreported; Mage = 31.36 years; see SI, p. 33 for additional analysis on English fluency).
Participants first responded to the proenvironmental behavior measure used in study 4a on a 7-point scale from “Never” to “All the time.” They then responded to a 6-item measure of perseverance adapted from an established scale (e.g. “I try to overcome setbacks to conquer important challenges;” 61) using a 7-point scale ranging from “Not like me at all” to “Extremely like me.” Finally, participants were asked to answer demographic questions.
Results
We averaged the six items used to measure participants' proenvironmental behaviors (αUS = 0.71; αIndia = 0.73), to form the dependent measure. Replicating the results in study 4a, Indians reported engaging more in proenvironmental behavior, M = 5.70, SD = 0.92, 95% CI [5.56, 5.83], than Americans, M = 4.94, SD = 0.98, 95% CI [4.80, 5.08]; t(354) = 7.48, P < 0.001, Cohen's d = 0.79, 95% CI [0.58, 1.00]. Similar to study 4a, controlling for demographic factors (e.g. age, gender, education, household income, etc.) did not change the results in any meaningful way (see SI, p. 33, for details).
We averaged the items of the perseverance scale to form the mediator measure (αUS = 0.91; αIndia = 0.82). As hypothesized, Indian participants (M = 5.67, SD = 0.91, 95% CI [5.53, 5.81]) were more likely to agree that they persevered in the face of challenges than Americans (M = 5.46, SD = 1.06, 95% CI [5.31, 5.61]; t(354) = 2.02, P = 0.044, Cohen's d = 0.21, 95% CI [0.01, 0.42]).
Next, we conducted an indirect effect analysis using PROCESS Model 4 (10000 bootstrap samples; 50) with the country (United States = 0, India = 1) as the independent measure, proenvironmental behavior as the dependent measure, and perseverance as the mediator. This analysis revealed a significant positive indirect effect of culture on greater proenvironmental behavior through greater perseverance, ab = 0.072, SE = 0.04, 95% CI [0.0033, 0.15]. The obtained significant results can be predicted if the assumption of a mediation model is correct. Importantly, given that demographic factors (e.g. age, gender, education, household income, etc.) may influence people's proenvironmental behaviors, we controlled for these factors in additional analysis. Controlling for these demographic variables (all together) strengthened the mediation results through perseverance even further (ab = 0.17, SE = 0.05, 95% CI [0.071, 0.28]; see SI (p. 33) for detailed results.
Studies 4a and 4b showed the utility of ML-CVI in identifying perseverance as an explanation for India–United States differences in environmental behavior. Therefore, these two studies provide additional evidence for the utility of ML-CVI as a tool that can provide culture scientists with not only elegant but also verifiable explanations for cultural differences in important outcomes. Studies 3a to 4b provide a step-by-step blueprint for how cultural psychologists may use the ML-CVI to uncover explanations for cultural differences that may be overlooked if the investigation is limited to existing cultural scales.
General discussion
The present research used a machine learning-based abductive approach to uncover important cultural values while combining prediction and explanation. Based on the machine learning analysis, we created the ML-CVI, a tool to help future researchers uncover specific cultural values that explain cultural differences in outcomes of interest. Four follow-up studies demonstrated ML-CVI's theoretical and practical relevance to broaden the range of explanatory frameworks that culture scientists can use.
It may be argued that the goal of existing cultural scales was not to maximize differences between cultures, whereas the ML-CVI was developed to maximize differences across countries. However, the goal of any cultural scale is to identify values that differ by culture. Indeed, past research has provided tables of country-level means showing how values differ across countries (e.g. 6, 65). If a set of values varies across countries, then, by definition, they differentiate countries. Classification per se is not a goal of existing cultural value inventories, but it is a natural outcome of the manner in which existing cultural value inventories were designed—they were designed to feature values that differ across countries. Moreover, although culture does not equal country, existing culture scales (e.g. 6, 45, 46) have been extensively used in the psychology and business literature to explain variations in outcomes across countries (e.g. (66); for a review, see Ref. 67).
A limitation of the ML-CVI is that, although the items fall under related themes, they do not coalesce into latent factors. This is because the individual items were selected for prediction (i.e. to be diagnostic of individuals' country of origin), not for the elegance of structure (8). However, studies 3a–4a indicated that these individual items can still be more effective in accounting for cultural differences in diverse consequential outcomes than subscale averages obtained from multi-item measures, thus suggesting that the latent variable conceptualization of psychological instruments likely sacrifices a substantial amount of explanatory power (68). Furthermore, we addressed this limitation in studies 3b and 4b by verifying whether the key finding of studies 3a–4a replicates with established multi-item measures that tap the construct of interest identified from the ML-CVI.
Another limitation of the ML-CVI is that we did not control for cross-national differences in economic, sociological, and ecological factors when identifying the set of values that best distinguish people from various countries. It is possible that some of the values that featured in the ML-CVI reflect economic, sociological, and ecological differences across countries rather than cultural differences once the other factors are held constant. Additional research is needed to identify the set of values that best distinguish people from different countries while accounting for economic, sociological, and ecological differences.
A limitation of studies 3a–3b and studies 4a–4b is that they do not establish causality. Instead, they generate and provide initial validation for hypotheses about values that mediate cross-cultural differences in behavior, but additional research is needed to verify the causal effects of the relevant values. Moreover, unlike the WVS, the participant samples in these studies were not representative of their respective countries but were instead convenience samples that could differ on any number of sociodemographic factors. First, this limitation is not unique to the current research but applies to virtually all cross-cultural research conducted with convenience or student samples. Second, controlling for several sociodemographic factors did not affect the results in any meaningful way. Importantly, though, if our convenience samples from the United States, Mexico, and India were vastly different from the representative samples used in the WVS, then the ML-CVI would perform badly when seeking to explain United States–Mexico and United States–India differences in these convenience samples. Nevertheless, the ML-CVI still outperformed all other scales in explaining cross-cultural differences. Studies 3a and 4a, respectively, and the explanations suggested by the ML-CVI were validated by subsequent studies (studies 3b–4b).
Whereas previous research in psychology has found that models can accurately predict an individual's personality, demographic characteristics, and other characteristics based on publicly available data (19, 20), we humbly submit that a key limitation of this body of research is that it focuses on prediction rather than explanation (8). Although we also built a predictive model, we used it as a stepping stone to build a tool that future researchers can use. We ensured that our model was interpretable and conducted feature importance analyses to provide an explanation for the model's predictions. We then use these explanatory insights to construct a new tool (i.e. ML-CVI) that future researchers can use in their own research to uncover novel theoretical explanations for cross-cultural variations in behavior. Given that psychology as a discipline prioritizes explanation over prediction, the true value of our research is not restricted to only predicting outcomes but also in explaining our model and using these explanations to create a general-purpose inventory. Similar to other culture scales, the ML-CVI is developed agnostic of dependent measures, so researchers can use it to explain cultural variation in a wide variety of dependent measures.
We acknowledge that the model used to create the ML-CVI inherited the limitations of the WVS dataset. It is possible that some important cultural values were not covered in the WVS or certain thematic domains (e.g. politics) were over-represented. Nevertheless, to the best of our knowledge, the WVS represents the largest survey of values (nearly 600 per our analysis) around the world (covering 98 countries at the time of our analysis representing 94.5% of the world's population; 34); thus, although it is not a perfect dataset, it is the best survey dataset of values across cultures that is currently available. Moreover, between-country distances computed from WVS values correlate with between-country distances computed from Facebook posts (23). In sum, while the values included in the ML-CVI may not be the most important values distinguishing humans across all time and space, given that it is the largest available survey of values across cultures at the time of the study, the ML-CVI represents our best understanding of what values (of the ones measured) best distinguish the countries sampled in the relevant dataset. Culture scientists can seek to conduct other, more balanced, and less biased surveys, add additional items to future WVS waves, and integrate complementary data sources, such as the Afrobarometer and Latino barometer, to capture under-represented culture-specific items in the WVS.
It may appear that by random chance, one of the 60 ML-CVI items has a greater likelihood of mediating cultural differences in an outcome of interest than the 33 (study 3a) or 31 (study 4a) other subscales that we compared the ML-CVI against. However, the other subscales were computed based on multiple items (range 3 to 12), whereas the ML-CVI items were not averaged. Thus, although the competing subscales were fewer in number, they had higher reliability. Moreover, these subscales have been established over decades of research. For these reasons, a priori, we expected that existing subscales would mediate cultural differences better than items picked up by our neural network in study 1.
The 92 (study 3a) or 91 (study 4a) mediation analyses that we conducted may raise concerns about spurious correlations. This would be a valid concern if we had either examined uncorrected P-values or had not conducted follow-up studies. However, we did not use uncorrected P-values; we constrained the overall FDR at 1% (P < 0.01) in both these studies using the Benjamini and Hochberg (49) method. In addition, we verified the significant item resulting from the analysis in studies 3a–4a in subsequent studies. The combination of the 1% FDR in studies 3a and 4a and the standard 5% FDR (P < 0.05) in studies 3b and 4b means that our overall FDR was 1%×5% = P < 0.0005, a much more stringent threshold than that used in the vast majority of social science research.
Studies 2–4b also helped us validate the model's trustworthiness despite the large volume of imputed data. If the imputed dataset was biased, then the ML-CVI items derived from a neural network built on this imputed dataset would not have outperformed traditional culture scales in the follow-up studies. Despite the volume of imputed data, the ML-CVI items were better able to differentiate cultures (study 2) and to explain cultural differences in outcomes of interest (studies 3a and 4a) than existing culture scales.
Finally, the WVS did not sample the same set of countries with similar sample sizes across all waves, and some regions of the world were missing in some waves, so the values that distinguish one set of countries in one wave may differ from those in another wave. Moreover, an additional wave of the WVS has been released since study 1 had been conducted, so future research can assess whether the ML-CVI can distinguish people from different countries included in Wave 7. Despite the inconsistent sampling of countries, the ML-CVI is sufficiently robust to differentiate the countries that were sampled in a future wave. Future research can continue to test the predictive accuracy of the ML-CVI as new WVS data are released.
In the most recent decade, researchers in the natural sciences have started using machine learning methods for making scientific discoveries (13). However, social scientists have primarily used machine learning methods for prediction. The current research is one of the first in the social sciences to use machine learning methods to generate a novel product—a cultural values inventory—that identifies theoretically novel dimensions along which cultures vary and can be used to uncover novel explanations for cultural differences.
Supplementary Material
Acknowledgments
We thank Shinobu Kitayama and Michael Frese for helpful feedback on previous drafts, and Sylvia Chin, Andrea Low, Dayana Bulchand, Samantha Seah, Tiffany Tan, Xin Yi Lim, Nandani Agarwal, Zhennan Xu, Khyati Gupta, Shuyan Chen, and Ying Chen for invaluable research assistance.
Notes
Following 1, whenever we refer to “culture,” we mean “national culture;” we do not mean cultures defined by region, language, religion, ethnicity, social class, and so on.
By “explain,” we mean “mediate cultural differences,” which is what the vast majority of cross-cultural research seeks to do in practice.
The reverse is not true, though—a scale could differentiate many countries but not explain cultural variation in outcomes. Studies 3a, 3b, 4a, and 4b assess whether the ML-CVI can explain cultural variation in outcomes.
We also assessed participants’ English fluency in this study and all subsequent studies. The responses showed that the participants were sufficiently fluent in English for the purpose of our studies. Importantly, the results are virtually indistinguishable if we exclude any participants who reported that they were not comfortable in English (see SI, pp. 23–30).
We did not include one reverse-coded item in this scale as reverse-scored items can affect the factor structure of the scale (55).
American participants also answered a third item (“strongly Democrat” to “strongly Republican”) but there was no equivalent item for the Mexican participants, so this item was not used in the analyses.
This study had one less mediation analysis than study 3a because the (60) cultural values scale that we used in this study did not include a subscale to measure Hofstede’s most recent value, indulgence vs. restraint.
Due to an error, we could not measure household income in this study for Indian participants. We corrected this error in the following study (study 4b).
Contributor Information
Abhishek Sheetal, Faculty of Business, The Hong Kong Polytechnic University, Li Ka Shing Tower, Hung Hom, Kowloon, Hong Kong.
Shilpa Madan, Lee Kong Chian School of Business, Singapore Management University, 50 Stamford Road, Singapore 178899.
Rui Ling Lee, Nanyang Business School, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798.
Krishna Savani, Faculty of Business, The Hong Kong Polytechnic University, Li Ka Shing Tower, Hung Hom, Kowloon, Hong Kong.
Supplementary Material
Supplementary material is available at PNAS Nexus online.
Funding
This research was supported by start-up funds provided by the Hong Kong Polytechnic University to A.S. and K.S., by Nanyang Technological University to K.S., and by Singapore Management University (ASEAN Business Research Initiative grant G17C20450) to S.M.
Data Availability
The WVS data are available at www.worldvaluessurvey.org; the machine learning code, and all survey materials, data, and analytic code are available at https://osf.io/fmduv.
References
- 1. Hofstede G. Culture's consequences: international differences in work-related values. Sage, Newbury Park, CA, 1980. [Google Scholar]
- 2. Markus HR, Kitayama S. 1991. Culture and the self: implications for cognition, emotion, and motivation. Psychol Rev. 98(2):224–253. [Google Scholar]
- 3. Triandis HC. 1996. The psychological measurement of cultural syndromes. Am Psychol. 51(4):407–415. [Google Scholar]
- 4. Schwartz SH, et al. 2012. Refining the theory of basic individual values. J Pers Soc Psychol. 103(4):663–688. [DOI] [PubMed] [Google Scholar]
- 5. Inglehart R, Baker WE. 2000. Modernization, cultural change, and the persistence of traditional values. Am Sociol Rev. 65(1):19–51. [Google Scholar]
- 6. Gelfand MJ, et al. 2011. Differences between tight and loose cultures: a 33-nation study. Science. 332(6033):1100–1104. [DOI] [PubMed] [Google Scholar]
- 7. Cohen D. 2001. Cultural variation: considerations and implications. Psychol Bull. 127(4):451–471. [DOI] [PubMed] [Google Scholar]
- 8. Yarkoni T, Westfall J. 2017. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect Psychol Sci. 12(6):1100–1122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Miller JW, Kulpa T. 2022. Econometrics and archival data: reflections for purchasing and supply management (PSM) research. J Purch Supply Manag. 28(3):100780. [Google Scholar]
- 10. Grossmann I, et al. 2023. AI and the transformation of social science research. Science. 380(6650):1108–1109. [DOI] [PubMed] [Google Scholar]
- 11. Muthukrishna M, et al. 2020. Beyond western, educated, industrial, rich, and democratic (WEIRD) psychology: measuring and mapping scales of cultural and psychological distance. Psychol Sci. 31(6):678–701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Friederich P, Krenn M, Tamblyn I, Aspuru-Guzik A. 2021. Scientific intuition inspired by machine learning-generated hypotheses. Mach Learn Sci Technol. 2(2):025027. [Google Scholar]
- 13. Gil Y, Greaves M, Hendler J, Hirsh H. 2014. Amplify scientific discovery with artificial intelligence. Science. 346(6206):171–172. [DOI] [PubMed] [Google Scholar]
- 14. Debrouwere S, Rosseel Y. 2022. The conceptual, cunning, and conclusive experiment in psychology. Perspect Psychol Sci. 17(3):852–862. [DOI] [PubMed] [Google Scholar]
- 15. Stokes JM, et al. 2020. A deep learning approach to antibiotic discovery. Cell. 180(4):688–702.e13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. van Rooij I, Baggio G. 2021. Theory before the test: how to build high-verisimilitude explanatory theories in psychological science. Perspect Psychol Sci. 16(4):682–697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Pierce CS. Abduction and induction. 1955. In: Buchler J, editor. Philosophical writings of pierce. New York: Dover Books. p. 150–156. [Google Scholar]
- 18. Bleidorn W, Hopwood CJ. 2019. Using machine learning to advance personality assessment and theory. Pers Soc Psychol Rev. 23(2):190–203. [DOI] [PubMed] [Google Scholar]
- 19. Kosinski M, Stillwell D, Graepel T. 2013. Private traits and attributes are predictable from digital records of human behavior. Proc Natl Acad Sci U S A. 110(15):5802–5805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Youyou W, Kosinski M, Stillwell D. 2015. Computer-based personality judgments are more accurate than those made by humans. Proc Natl Acad Sci U S A. 112(4):1036–1040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Joel S, Eastwick PW, Finkel EJ. 2017. Is romantic desire predictable? Machine learning applied to initial romantic attraction. Psychol Sci. 28(10):1478–1489. [DOI] [PubMed] [Google Scholar]
- 22. Liew K, Hamamura T, Uchida Y. 2025. Machine learning culture: cultural membership classification as an exploratory approach to cross-cultural psychology. Pers Soc Psychol Bull. 10.1177/01461672251339313. [DOI] [PubMed] [Google Scholar]
- 23. Obradovich N, et al. 2022. Expanding the measurement of culture with a sample of two billion humans. J R Soc Interface. 19(190):20220085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Sheetal A, Feng Z, Savani K. 2020. Using machine learning to generate novel hypotheses: increasing optimism about COVID-19 makes people less willing to justify unethical behaviors. Psychol Sci. 31(10):1222–1235. [DOI] [PubMed] [Google Scholar]
- 25. Sheetal A, Savani K. 2021. A machine learning model of cultural change: the role of prosociality, political attitudes, and protestant work ethic. Am Psychol. 76(6):997–1012. [DOI] [PubMed] [Google Scholar]
- 26. Madan S, Basu S, Ng S, Savani K. 2022. The breadth of normative standards: antecedents and consequences for individuals and organizations. Organ Behav Hum Decis Process. 172:104181. [Google Scholar]
- 27. Catterberg G, Moreno A. 2006. The individual bases of political trust: trends in new and established democracies. Int J Public Opin Res. 18(1):31–48. [Google Scholar]
- 28. Desmet K, Ortuño-Ortín I, Wacziarg R. 2017. Culture, ethnicity, and diversity. Am Econ Rev. 107(9):2479–2513. [Google Scholar]
- 29. Desmet K, Ortuño-Ortín I, Wacziarg R. Latent polarization; 2024 Oct 3 [accessed 2025 Aug 4]. https://uc3nomics.uc3m.es/latent-polarization/.
- 30. Flanagan SC, Lee AR. 2003. The new politics, culture wars, and the authoritarian-libertarian value change in advanced industrial democracies. Comp Polit Stud. 36(3):235–270. [Google Scholar]
- 31. Li LMW, Bond MH. 2010. Value change: analyzing national change in citizen secularism across four time periods in the world values survey. Soc Sci J. 47(2):294–306. [Google Scholar]
- 32. Minkov M, Hofstede G. 2012. Hofstede's fifth dimension: new evidence from the world values survey. J Cross Cult Psychol. 43(1):3–14. [Google Scholar]
- 33. Inglehart R, et al. Data from “World values survey: all rounds—country-pooled data- file (Version V4)”; 2014 [accessed 2025 Aug 4]. https://www.worldvaluessurvey.org/WVSDocumentationWVL.jsp.
- 34. World Values Survey [WVS] . Data from “Who We Are”; 2020 [accessed 2025 Aug 4]. https://www.worldvaluessurvey.org/WVSContents.jsp.
- 35. Abrol A, et al. 2021. Deep learning encodes robust discriminative neuroimaging representations to outperform standard machine learning. Nat Commun. 12(1):353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Mayer M, Mayer MM. Package ‘missRanger’. R Package, 2019 [accessed 2025 Aug 4]. http://download.nust.na/pub3/cran/web/packages/missRanger/missRanger.pdf.
- 37. Wright MN, Ziegler A. 2017. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 77(1):1–17. [Google Scholar]
- 38. Stekhoven DJ, Bühlmann P. 2012. Missforest-non-parametric missing value imputation for mixed-type data. Bioinformatics. 28(1):112–118. [DOI] [PubMed] [Google Scholar]
- 39. Singh H, Kaur A, Kaur H. 2022. Performance analysis of missing data imputation methods. Int J Sci Res Eng Trends. 8:442–447. [Google Scholar]
- 40. Smith BI, Chimedza C, Bührmann JH. Random forest missing data imputation methods: implications for predicting at-risk students. 2021. In: Abraham A, Siarry P, Ma K, Kaklauskas A, editors. Intelligent systems design and applications: 19th international conference on intelligent systems design and applications (ISDA 2019). Springer International Publishing. p. 298–308.
- 41. Sheetal A, Chaudhury SH, Savani K. 2022. A deep learning model identifies emphasis on hard work as an important predictor of income inequality. Sci Rep. 12(1):9845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Sheetal A, Ma A, Infurna F. 2024. Psychological predictors of socioeconomic resilience amidst the COVID-19 pandemic: evidence from machine learning. Am Psychol. 79(8):1139–1154. [DOI] [PubMed] [Google Scholar]
- 43. Shanker M, Hu MY, Hung MS. 1996. Effect of data standardization on neural network training. Omega (Westport). 24(4):385–397. [Google Scholar]
- 44. Murtagh F, Legendre P. 2014. Ward's hierarchical agglomerative clustering method: which algorithms implement Ward's criterion? J Classif. 31(3):274–295. [Google Scholar]
- 45. Hofstede G, Minkov M. Values survey module 2013; 2013 Aug 25 [accessed 2025 Aug 4]. http://geerthofstede.com/research-and-vsm/vsm-2013/.
- 46. Singelis TM. 1994. The measurement of independent and interdependent self-construals. Pers Soc Psychol Bull. 20(5):580–591. [Google Scholar]
- 47. Kuhn M, Johnson K. Applied predictive modeling, volume twenty six Springer, New York, 2013. [Google Scholar]
- 48. Sanchez-Burks J, Nisbett RE, Ybarra O. 2000. Cultural styles, relationship schemas, and prejudice against out-groups. J Pers Soc Psychol. 79(2):174–189. [DOI] [PubMed] [Google Scholar]
- 49. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 57(1):289–300. [Google Scholar]
- 50. Hayes AF. Introduction to mediation, moderation, and conditional process analysis: a regression-based approach. Guilford Publications, New York, NY, 2017. [Google Scholar]
- 51. Whitehead AL, Perry SL, Baker JO. 2018. Make America Christian again: Christian nationalism and voting for Donald Trump in the 2016 presidential election. Sociol Relig. 79(2):147–171. [Google Scholar]
- 52. Facts about Mexico . Mexican religion; 2022 Jun 25 [accessed 2025 Aug 4]. https://www.facts-about-mexico.com/mexican-religion.html.
- 53. Public Religion Research Institute . The 2020 census of American religion; 2021 Jul 8 [accessed 2025 Aug 4]. https://www.prri.org/research/2020-census-of-american-religion/.
- 54. Liptak A. Splitting 5 to 4, Supreme Court backs religious challenge to Cuomo's Virus shutdown order. The New York Times; 2020 [accessed 2025 Aug 4]. https://www.nytimes.com/2020/11/26/us/supreme-court-coronavirus-religion-new-york.html.
- 55. Weijters B, Baumgartner H. 2012. Misresponse to reversed and negated items in surveys: a review. J Mark Res. 49(5):737–747. [Google Scholar]
- 56. Davis NT. 2023. The psychometric properties of the Christian nationalism scale. Politics Relig. 16(1):1–26. [Google Scholar]
- 57. Markle GL. 2013. Pro-environmental behavior: does it matter how it's measured? Development and validation of the pro-environmental behavior scale (PEBS). Hum Ecol. 41(6):905–914. [Google Scholar]
- 58. Stern PC, Dietz T, Ruttan VW, Socolow RH, Sweeney JL. Strategies for setting research priorities. 1997. In: Stern P, editor. Environmentally significant consumption: research directions. Washington: National Academy Press. p. 124–137. [Google Scholar]
- 59. Stern PC. 2000. New environmental theories: toward a coherent theory of environmentally significant behavior. J Soc Issues. 56(3):407–424. [Google Scholar]
- 60. Yoo B, Donthu N, Lenartowicz T. 2011. Measuring Hofstede's five dimensions of cultural values at the individual level: development and validation of CVSCALE. J Int Consum Mark. 23(3-4):193–210. [Google Scholar]
- 61. Duckworth AL, Peterson C, Matthews MD, Kelly DR. 2007. Grit: perseverance and passion for long-term goals. J Pers Soc Psychol. 92(6):1087–1101. [DOI] [PubMed] [Google Scholar]
- 62. Datu JAD, Valdez JPM, King RB. 2016. Perseverance counts but consistency does not! validating the short grit scale in a collectivist setting. Curr Psychol. 35(1):121–130. [Google Scholar]
- 63. Xu KM, et al. 2023. A cross-cultural investigation on perseverance, self-regulated learning, motivation, and achievement. Compare. 53(3):361–379. [Google Scholar]
- 64. Datu JAD, Buenconsejo JU. 2021. The ecological benefits of staying gritty: grit dimensions are associated with pro-environmental passion, awareness, and behaviours. Aust J Psychol. 73(4):416–425. [Google Scholar]
- 65. Hofstede G, Hofstede GJ, Minkov M. Cultures and organizations: software of the mind. McGraw Hill, New York, NY, 2010, 3rd volume. [Google Scholar]
- 66. Madan S, Savani K, Katsikeas CS. 2023. Privacy please: power distance and people’s responses to data breaches across countries. J Int Bus Stud. 54:731–754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Kirkman BL, Lowe KB, Gibson CB. 2006. A quarter century of culture's consequences: a review of empirical research incorporating Hofstede's cultural values framework. J Int Bus Stud. 37(3):285–320. [Google Scholar]
- 68. Borsboom D, Mellenbergh GJ, Van Heerden J. 2004. The concept of validity. Psychol Rev. 111(4):1061–1071. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The WVS data are available at www.worldvaluessurvey.org; the machine learning code, and all survey materials, data, and analytic code are available at https://osf.io/fmduv.



