Skip to main content
Public Opinion Quarterly logoLink to Public Opinion Quarterly
. 2017 May 19;81(2):447–472. doi: 10.1093/poq/nfx009

Necessary but Insufficient

Why Measurement Invariance Tests Need Online Probing as a Complementary Tool

Katharina Meitinger 1,*
PMCID: PMC5452432  PMID: 28579643

Abstract

Cross-national data production in social science research has increased dramatically in recent decades. Assessing the comparability of data is necessary before drawing substantive conclusions that are based on cross-national data. Researchers assessing data comparability typically use either quantitative methods such as multigroup confirmatory factor analysis or qualitative methods such as online probing. Because both methods have complementary strengths and weaknesses, this study applies both multigroup confirmatory factor analysis and online probing in a mixed-methods approach to assess the comparability of constructive patriotism and nationalism, two important concepts in the study of national identity. Previous measurement invariance tests failed to achieve scalar measurement invariance, which prohibits a cross-national comparison of latent means (Davidov 2009). The arrival of the 2013 ISSP Module on National Identity has encouraged a reassessment of both constructs and a push to understand why scalar invariance cannot be achieved. Using the example of constructive patriotism and nationalism, this study demonstrates how the combination of multigroup confirmatory factor analysis and online probing can uncover and explain issues related to cross-national comparability.


With the proliferation of cross-national surveys, such as the International Social Survey Program (ISSP) and European Social Survey (ESS), access to cross-national datasets has been tremendously facilitated. One precondition for the analysis of such data is the assessment of their cross-national comparability. Two research traditions can be distinguished in this context. In the quantitative tradition, comparability is often assessed with measurement invariance tests that use multigroup confirmatory factor analysis (MGCFA) (Jöreskog 1971). This approach can test the cross-national comparability of numerous countries. The testing strategy is implemented using analysis software, such as Mplus, making it a handy control instrument for researchers analyzing secondary data. However, if these tests fail to establish cross-national comparability, this approach struggles to explain the existing noninvariance. By contrast, researchers in the qualitative tradition will most likely conduct cognitive interviews (CIs) (Miller et al. 2011) or online probing (OP) (Braun et al. 2014). These methods primarily seek to uncover the causes for the missing comparability, and they often reveal unexpected reasons. The drawbacks of these methods are the necessity of collecting data and the work-intensive analysis (Meitinger and Behr 2016), which limit the analysis to a small set of countries. Much can be learned through a combined approach of both perspectives.

This study closes a research gap by simultaneously applying MGCFA and OP to assess the comparability of the constructs of constructive patriotism and nationalism. It will introduce MGCFA and OP, followed by a short discussion of constructive patriotism and nationalism. The constructs’ comparability will first be assessed with MGCFA and then with OP. Previous research has indicated that the items that measure constructive patriotism are particularly problematic (Latcheva 2011); as such, this study will focus on these items with regard to OP. Finally, the conclusions from both methods will be compared, and optimal research strategies that combine the two methods’ respective strengths will be presented.

The Quantitative Approach: Tests of Measurement Invariance

Most researchers conduct three tests of comparability when applying MGCFA: configural, metric, and scalar invariance tests (Vandenberg and Lance 2000). The three tests are nested, with configural invariance providing a test for the lowest and scalar invariance for the highest level of invariance.

Configural invariance concerns whether all countries have the same factor structure (Horn and McArdle 1992). If configural invariance is established, the latent concept can be meaningfully discussed in all countries (Davidov et al. 2014). However, the respondents can still answer items differently because factor loadings may vary (Steenkamp and Baumgartner 1998).

The test for metric invariance addresses this issue by requiring equal factor loadings across countries (Rock, Werts, and Flaugher 1978). If metric invariance is supported, exploring cross-national structural relationships with other constructs is possible (Steenkamp and Baumgartner 1998). If metric invariance tests fail, researchers can still opt for partial metric invariance. Cross-national comparisons are acceptable if all constructs are measured with at least two items with equal factor loadings (Byrne, Shavelson, and Muthén 1989).

However, many cross-national researchers aim to compare mean values across countries. As a systematic bias might affect the mean values (Meredith 1993), testing for scalar invariance is necessary. Scalar invariance tests additionally require equal intercepts. If full scalar invariance does not apply, opting for partial scalar invariance is another possibility (Steenkamp and Baumgartner 1998).

Goodness-of-fit (GOF) indices, such as the root mean square error of approximation (RMSEA) (Browne and Cudeck 1992) and the comparative fit index (CFI) (Bentler 1990), assess the model fit of the baseline (configural) model. Hu and Bentler (1999) suggest CFI values of at least .95 and RMSEA values below .06 for a good model fit and RMSEA values below .08 for an acceptable model fit. If configural invariance is achieved, the baseline model can be compared with more restricted models1 by calculating the difference in the CFI and RMSEA values of the different test levels, ΔCFI and ΔRMSEA. A change of more than .01 for CFI and .015 for RMSEA indicates problematic values (Chen 2007).

If the GOF values are unsatisfactory, MGCFA provides modification indices (MIs) (Steenkamp and Baumgartner 1998), which help the researcher determine which parameters to free to improve the model fit. As the MIs are sensitive to sample size (Cheung and Rensvold 2002), Saris, Satorra, and Van der Veld (2009) suggested also considering the power of the MI test and the expected parameter change (EPC), which estimates the degree of the parameter’s misspecification (Brown 2014). Large values suggest a model respecification. With Jrule2 software for Mplus, these values are easily obtainable (Oberski 2014).

Although MGCFA indicates problematic items, it does not explain why items are problematic in cross-national comparisons. Furthermore, the provision of MIs and EPCs might tempt researchers to provide substantive ad hoc explanations for noninvariance. However, when using this approach, determining whether measurement invariance is missing because of a methodological artifact or because of different realities is impossible.

Several quantitative approaches exist that aim to reveal the sources of noninvariance. On the micro level, the multiple indicators multiple causes (MIMIC) model tests whether the item is affected by individual variables (e.g., gender) and controls for this differential item functioning (Davidov et al. 2014). On the macro level, multilevel structural equation models (MLSEMs) explain noninvariance by introducing conceptual predictor variables in a multilevel analysis (Davidov et al. 2012). However, for an accurate estimation, samples should exceed 50 countries (Meuleman and Billiet 2009); therefore, such estimation is unavailable for studies that compare few countries. Additionally, the MIMIC and MLSEM approaches need a priori hypotheses about cultural differences or the reasons for bias (Van de Vijver 2011). The study will only be as good as the researchers’ capabilities to discover the correct explanations for noninvariance to include in the analysis and, of course, the availability of the corresponding data. However, previously unknown and surprising causes might exist. Finding these unexpected causes is one of the major advantages of qualitative approaches.

The Qualitative Approach: Online Probing

Different qualitative approaches can evaluate the cross-national comparability of items. During traditional CIs, survey questions are administered to respondents “while collecting additional verbal information about survey responses, which is used to evaluate the quality of the response or to help determine whether the question is generating the information that its author intends” (Beatty and Willis 2007, 287). Interviewers ask follow-up questions called “probes” to retrieve additional information. In CI, probes can either be asked during the interview (embedded probing) or as a set after the interviewee has responded to the entire questionnaire (retrospective probing) (Willis 2005). Probing can also detect instances of silent misinterpretation, where respondents are unaware that they have misunderstood the item (DeMaio and Rothgeb 1996). Traditional CI usually uses small sample sizes (e.g., 5–15 respondents), but the questionnaire is often tested in iterative rounds where the improved questionnaires are retested before fielding (Willis 2005).

A second approach is OP, which applies CI probing techniques in web surveys. Although the application of embedded probing (implementation of probes within the survey) dates back to Schuman (1966) and Converse and Presser (1986), the implementation within web surveys is rather recent (Braun et al. 2014). OP combines qualitative insights from CIs with large sample sizes in several countries. For example, the sample of one OP study consisted of 3,695 respondents in six countries (Behr et al. 2014). As all respondents receive the same probe, the procedure is highly standardized (Braun et al. 2014) and therefore circumvents harmonization issues, a challenge in cross-national CIs due to, for example, the varying levels of interviewers’ skills (Gray and Blake 2015). The large sample size increases the generalizability of results, allows for an evaluation of the prevalence of problems or themes, and can explain the response patterns of specific subpopulations (Braun et al. 2014). However, the qualitative nature of the approach limits the analysis to a small subset of items because the motivating effect of an interviewer is missing in the web implementation (Meitinger and Behr 2016). Moreover, the application of OP is restricted to few countries because a coding schema must be developed for each probe and the answers must be coded. If several languages are used, probe answers must be translated (Behr 2015). Because OP is implemented within web surveys, it is currently impossible to record spontaneous comments and react as flexibly on respondents’ answers as in CI (Meitinger and Behr 2016). In contrast to CI, OP is not conducted in iterative rounds. Most OP studies have used the method after official data collection to follow up on problematic items. Nevertheless, the method also potentially can be used as a pretesting device or serve as a control tool during the actual data collection (Braun et al. 2014).

Research Objectives

The main goal of this article is to demonstrate how MGCFA measurement invariance tests and OP can be combined to assess the cross-national comparability of survey items. The study first evaluates the cross-national comparability of nationalism and constructive patriotism using MGCFA. Then, the OP results for the items measuring constructive patriotism are presented because respondents struggled most with these items due to the question formulation and problematic key terms (Latcheva 2011).

The Concepts of Constructive Patriotism and Nationalism

Constructive patriotism and nationalism are two dimensions of national identity (Blank and Schmidt 2003). Nationalists idealize their nation, display feelings of national superiority, uncritically accept national authorities, and suppress ambivalent attitudes toward their nation. Nationalists define their group based on descent, race, or culture, and denigrate groups that they do not consider part of the nation. By contrast, constructive patriots reject an idealization of the nation. Their support for the nation depends on its alignment with humanistic and democratic principles. They value an advanced social system, are open to criticism, and reject an uncritical acceptance of state authorities (see also Davidov [2009]).

EVALUATION OF CONSTRUCTIVE PATRIOTISM AND NATIONALISM

Previous quantitative findings:

Davidov (2009) developed a cross-national measure of both concepts using five items from the 2003 ISSP Module on National Identity (see the appendix for item wordings). Nationalism was measured with two five-point Likert items that assess feelings of national superiority. Three items evaluating one’s pride in the country’s democracy, its social security system, and the fair and equal treatment of all groups in society measured constructive patriotism (using a four-point scale ranging from “very proud” to “not proud at all”). As both constructs are measured with a total of five items, measurement invariance tests must simultaneously assess constructive patriotism and nationalism. If tested separately, the unconstrained models cannot be fully estimated because the models are either simply identified (constructive patriotism) or not identified (nationalism).

Davidov (2009) tested the constructs for measurement invariance with 34 countries. He established metric invariance of both constructs, which allowed for a comparison of the constructs’ correlates but not their means. Additionally, Davidov (2011) investigated whether both ISSP measures are invariant over time, using data from the 1995 ISSP and 2003 ISSP. He could confirm partial scalar invariance for 21 countries, thus supporting a comparison of the constructs’ correlates and means over time.

Previous qualitative findings:

Two CI studies in Austria (Fleiß, Höllinger, and Kuzmics 2009; Latcheva 2011) and one CI and OP study in Germany (Meitinger and Behr 2016) tested these items and uncovered several issues. First, respondents in both countries had issues with the word “pride” (Latcheva 2011; Meitinger and Behr 2016). Second, German respondents struggled with the high complexity of terms, such as “social security system” (Meitinger and Behr 2016). Third, several key terms (e.g., “all groups of society,” “democracy”) were unclear and respondents adopted different perspectives when answering these items (Fleiß, Höllinger, and Kuzmics 2009; Latcheva 2011). Although these studies uncovered several issues with the tested items, they did not adopt a cross-national comparative perspective.

Methods and Data

MULTIGROUP CONFIRMATORY FACTOR ANALYSIS

The multigroup confirmatory factor analysis (MGCFA) was based on the 2013 ISSP Module on National Identity (ISSP Research Group 2015), with analysis limited to the five countries in the web survey: Germany (N = 1,717), Great Britain (N = 904), the United States (N = 1,274), Mexico (N = 1,062), and Spain (N = 1,225). Constructive patriotism and nationalism were measured with the same five items following Davidov (2009; see figure 1 and the appendix for item wording).

Figure 1.

Figure 1.

CFA of the Constructs of Nationalism (NAT) and Constructive Patriotism (COP).

ONLINE PROBING

The OP results derived from a web survey conducted with 2,685 respondents in May 2014. The survey participants from Germany, Great Britain, Mexico, the United States, and Spain were drawn from a nonprobability online panel with quotas for age (18–30, 31–50, and 51–65), gender, and education (low and high).3 The survey replicated questions from the ISSP Module on National Identity. As probing increases the response burden for respondents, the sample was randomly split into five groups of approximately 500 respondents each (approximately 100 respondents per country).4 All respondents answered each closed item on a separate screen. For each item, one-fifth of the respondents received an additional probe on a separate screen.

After the “democracy” item, a category selection probe (Prüfer and Rexroth 2005) inquired why a certain answer category had been chosen, and a specific probe that asked for additional information on a detail of the question (Willis 2005) followed the “social security” and “fair and equal” items.5

Based on the probe answers, a separate coding scheme was developed for each item. Multiple coding applied for all probes. A researcher coded all probe responses, and student assistants repeated the coding. The intercoder reliability was high (Holsti’s coefficient for “democracy”: .94; “social security”: .97; “fair and equal”: .98).

Because quota-based sampling was used, regression analyses that controlled for demographic factors (age, gender, and education) were also run for each code before assessing between-country differences. Reassuringly, the differences between countries for relevant code variables stayed significant after controlling for demographic confounds.6

On average, respondents were willing to answer the probes (substantive respondents: “democracy”: 95 percent; “fair and equal”: 84 percent). However, several respondents gave mismatching answers to the “social security” item (e.g., they responded to a category selection probe instead of a specific probe), which resulted in a total of 72 percent substantive respondents (9 percent nonresponse and 19 percent mismatches).

Substantive respondents wrote short but efficient answers (number of characters7 “democracy”: M = 90, SD = 70; “social security”: M = 89, SD = 77; “fair and equal”: M = 47, SD = 44), with US respondents providing the shortest answers and Mexican and Spanish respondents providing the longest. Most respondents mentioned one or two reasons or themes (“democracy”: M = 1.2, SD = 1.2; “social security”: M = 1.4, SD = .7; “fair and equal”: M = 1.7, SD = 1.0). For the “democracy” and “social security” item, respondents spent an average of approximately one minute per probe (“democracy”: M = 57s, SD = 57s; “social security”: M = 57s, SD = 64s), but response times were shorter for the “fair and equal” item (M = 43s, SD = 39s).

Results of Measurement Invariance Tests: MGCFA

Measurement invariance of constructive patriotism and nationalism in the 2013 ISSP data was tested with the Mplus 7.31 software package (Muthén and Muthén 2015). As the nonresponse rate was high in some countries (e.g., “fair and equal” item: 13.2 percent in Germany), the analysis utilizes full information maximum likelihood (FIML) estimators with raw data, which accounts well for missing data (Brown 2014).8

SINGLE-COUNTRY ANALYSIS

A preliminary step for measurement invariance tests involves establishing that the model fits well in each country (Byrne and Van de Vijver 2010). The analysis began with a separate CFA run in each country. The standardized factor loadings were sufficiently high in all countries, reaching at least .50 (see table OS2 in the supplementary materials online). The moderate correlation between the two latent factors supported a two-factor solution (see table 1). CFI values above .95 indicated a very good model fit in all countries. The RMSEA values suggested a very good model fit for Germany, Great Britain, and Spain but only an acceptable fit for the United States and Mexico, so the single-country model for all countries was accepted without any modifications. However, given the elevated RMSEA values, we still inspected the MIs, the power of the MI test, and the EPC values for the United States and Mexico in Jrule (see table 2). For both countries, Jrule suggested a cross-loading of the “fair and equal” item on the nationalism factor. Some respondents might perceive this item as representing patriotic values, and it reflects nationalistic attitudes for some. For Mexico, Jrule additionally recommended freeing an error correlation between the “democracy” and “social security” variables. Apparently, a cause other than the constructive patriotism factor explains part of the correlation between these two items.

Table 1.

Single-Country Analyses: RMSEA, CFI, and Correlations between Nationalism and Constructive Patriotism (standard errors in parentheses)

Country RMSEA CFI Correlation (SE)
1. Germany 0.063 [0.044–0.084] 0.982 N↔ CP: .36 (.03)
2. Great Britain 0.000 [0.000–0.044] 1.000 N↔ CP: .38 (.05)
3. Mexico 0.074 [0.049–0.101] 0.982 N↔ CP: .54 (.04)
4. Spain 0.055 [0.032–0.082] 0.987 N↔ CP: .68 (.03)
5. United States 0.075 [0.052–0.100] 0.952 N↔ CP: .49 (.05)

Table 2.

Misspecifications Indicated by Jrule: Suggestions, Affected Parameter, Modification Indices (MI), Expected Parameter Changes (EPC), the Power of the MI-Test, and Non-Centrality Parameter (NCP) for Single-Country Analysis and Measurement Invariance Tests

Suggestion Parameter Group MI EPC Power NCP
Single country analysis
 Add error correlation V28 WITH V25 Mexico 26.144 .212 .998 23.27
 Add cross loading NAT BY V34 Mexico 26.152 .229 .994 19.95
NAT BY V34 United States 19.534 .403 .992 19.24
Configural
 Add error correlation V28 WITH V25 Mexico 26.094 .212 .998 23.22
 Add cross loading NAT BY V34 Mexico 26.157 .229 .994 19.95
NAT BY V34 United States 19.559 .403 .593 4.82
NAT BY V28 Spain 14.486 –.216 .941 12.42
NAT BY V28 United States 12.604 –.283 .709 6.30
Metric
 Set parameter free COP BY V25 Spain 3.952 .141 .562 4.47
COP BY V28 Spain 8.533 –.169 .737 6.72
COP BY V28 Mexico 15.735 .159 .963 14.00
 Add cross loading NAT BY V28 Spain 20.165 –.150 .994 20.17
NAT BY V34 United States 17.877 .198 .893 10.26
Scalar (selection of most extreme parameters)
 Set parameter free V20 United States. 135.552 –.781 .609 5.00
V25 United States 262.883 –.494 .998 24.24
V25 Spain 451.877 .739 .991 18.62
V28 United States. 290.906 .476 1.000 28.89
V28 Great Britain 130.758 .311 1.000 30.42
V28 Spain 273.636 –.522 .997 22.60
V28 Mexico 90.643 .333 .990 18.39

MEASUREMENT INVARIANCE TESTS

To begin, the cross-national configural invariance (test for equal factor structure) was assessed and confirmed by the GOF values. RMSEA and CFI indicated a very good fit (see table 3). Therefore, a meaningful discussion of constructive patriotism and nationalism across countries was possible (Davidov et al. 2014). Similar to the single-country analysis, MI and EPC values in Jrule (see table 2) suggested adding an error correlation between the “democracy” and “social security” variables for Mexico, a cross-loading of the “fair and equal” item on the nationalism factor (the United States; Mexico), and a cross-loading of the “social security” item on the nationalism factor for Spain and the United States. Given the very good model fit (GOF indices), we refrained from respecifying the model.

Table 3.

MGCFA: Fit Measures of the Measurement Invariance Test

Model Chi2 df ΔRMSEA RMSEA ΔCFI CFI
1. Configural 112.9 20 0.061 [0.051–0.073] 0.982
2. Full metric 144.4 32 –.008 0.053 [0.045–0.062] –.003 0.979
3. Scalar invariance 1582.4 42 +.116 0.169 [0.162–0.176] –.269 0.710
3a. Partial scalar: [V25] 831.0 40 +.074 0.127 [0.119–0.134] –.128 0.851
3b. Partial scalar: [V28] 742.8 40 +.067 0.120 [0.112–0.127] –.112 0.867
3c. Partial scalar: [V34] 1265.0 40 +.105 0.158 [0.150–0.165] –.210 0.769

Metric invariance (test for equal factor loadings) could also be established because the ΔCFI did not exceed .01 and the ΔRMSEA was below .015. Therefore, exploring the structural relationships with other constructs across countries was possible (Steenkamp and Baumgartner 1998). The MI and EPC values for the metric invariance test (table 2) suggested freeing the factor loading of the “democracy” item for Spain and the “social security” item for Spain and Mexico. Apparently, the factor loadings in these countries differ from the remaining countries. However, the model was not respecified given the good model fits reflected in the GOF indices.

Unfortunately, the test for full scalar invariance (test for equal intercepts) failed because ΔCFI and ΔRMSEA clearly exceeded their critical values (ΔRMSEA: .116, ΔCFI: .269). However, partial scalar invariance (Steenkamp and Baumgartner 1998) is viable for only constructive patriotism because it is measured by three indicators. Therefore, following Davidov (2009), three additional models were estimated in which the intercepts of one indicator of constructive patriotism were separately freed for all countries. Although the GOF indices improved, particularly when freeing the intercept for the “social security” item (flagging this item as the most problematic item), partial scalar invariance tests still failed. Thus, comparing the means of the latent constructs is impossible.

The parameter values in Jrule reflect this finding (see table 2): Elevated MI and EPC values for the intercept of Spain and the United States at the “democracy” item and of four countries at the “social security” item (all except Germany) indicate that the item intercepts for these two items partially differ across countries. Although MI and EPC values are particularly high for the “democracy” item in Spain, the most misspecifications were found for the “social security” item, indicating that this item is particularly biased.

Results: Online Probing

CATEGORY SELECTION PROBE FOR “DEMOCRACY”

Although cross-national studies about democracy abound, several studies indicate that respondents conceptualize democracy in different ways (Canache, Mondak, and Seligson 2001; Baviskar and Malone 2004). However, Behr and Braun (2015) note that the very fact that respondents in all countries consider various dimensions of democracy makes this item cross-nationally comparable. This study used an adapted version of Behr and Braun’s code schema. Respondents’ acceptance of the democratic system is an important precondition for the item’s cross-national comparability. Low pride values should reflect that the system does not meet the respondents’ expectations; it should not reflect the respondents’ preferences for a more authoritarian form of government.

The coding scheme for the “democracy” item:

The coding scheme for the category selection probe captures positive and negative evaluations of different aspects of democracy. Some respondents thought about the actions of democratic authorities, which might have entailed an evaluation of the living conditions in a country or specific policy domains (e.g., positive: tolerant society, nuclear phase-out in Germany; negative: growing insecurity, tax increases). Other respondents evaluated whether the governance (e.g., politicians) was working according to democratic ideals and rules (e.g., positive: low level of lobbying and corruption; negative: high level of lobbying and corruption). Respondents also perceived the political system, the institutions, or the constitutional arrangements as (un)democratic (e.g., democratic: free elections, working rule of law; undemocratic: malfunctioning of checks and balances, no freedom of speech). Respondents who considered other political systems superior to the democratic system would be coded here. A few respondents disapproved of the lack of citizens’ support in upholding democracy (e.g., low voter turnout) or evaluated the democratic situation in a more general sense. They mentioned that a working democracy existed, that the democracy needed some improvement, or that no democracy existed. Some respondents also compared their country with other countries. Other respondents were proud of their country independent of its democratic situation, and some identified problems with the question—two potentially problematic codes for cross-national comparability. All remaining answers were coded as other.

Probe results for the “democracy” item:

Regarding the actions of authorities, between 7 and 12 percent of the respondents in all countries focused on negative aspects, mostly complaining about the growing inequality or about recent government policies (see table 4). A more detailed content analysis of this code (1N) might indicate that Mexican participants differ in this context from the other respondents because they refer to the problematic security situation in their country (e.g., “because of the existing criminality”), an issue that respondents from the other countries did not mention.

Table 4.

Percentages of Respondents Mentioning Codes for the “Democracy” Item (substantive responses)

Code Germany (%) Great Britain (%) United States (%) Mexico (%) Spain (%)
1P: Actions of authorities: Positive evaluation 1 1 4 0 0
1N: Actions of authorities: Negative evaluation 12 11 7 12 12
2P: Governance: Positive evaluation 1 1 1 1 0
2N: Governance: Negative evaluation 18 17 16 34 45
3P: Political system, institutions, and constitutional arrangements  perceived as democratic 14 13 13 3 8
3N: Political system, institutions, and constitutional arrangements  perceived as undemocratic 15 18 12 28 28
4N: Lack of citizens’ support in upholding democracy 3 7 2 1 3
5: A working democracy exists 15 16 7 0 1
6: Democracy can be improved 4 10 8 5 7
7: No democracy exists 9 4 4 20 11
8: Comparison with other countries 11 15 15 2 5
9: Pride judgment independent of democratic situation 0 3 6 0 0
10: Problems with the question 2 5 1 1 1
11: Other 14 11 13 8 6
Nrespondents
94 100 96 117 110
Ncodings
111 133 104 134 139

Note.—Percentages refer to the number of respondents mentioning each code. Multiple responses were possible.

Although respondents from all countries mostly evaluated the governance aspect of democracy negatively (Germans: lobbying; other countries: corrupt politicians), more Spaniards and Mexicans mentioned negative aspects than those in other countries, which reflects the countries’ diverging realities (e.g., Mexico: elevated corruption; Freedom House 2015).

The cross-national differences also hold for the evaluation of the general setup of the political system, institutions, and constitutional arrangements, but the respondents also associated positive aspects, such as freedom of speech (all countries) and free elections (all but Mexico), with this democratic aspect. Meanwhile, several respondents perceived the general set-up of the democratic system as undemocratic (e.g., Germany: no direct democracy; Great Britain: first-past-the-post voting). Most Spanish and Mexican respondents mentioned negative aspects here (e.g., Spain: insufficient separation of powers; Mexico: judiciary failure, no free elections), which reflects the current state of Mexico’s democracy, which was downgraded from “free” to “partly free” in Freedom House’s 2014 annual report (Freedom House 2015). Interestingly, no respondents rejected the idea of democracy altogether and would prefer a more authoritarian system instead. This finding is reassuring because a condition for this item’s cross-national comparability is that low pride values do not simultaneously reflect discontent with the democratic system and an endorsement of more authoritarian alternatives. Fortunately, only a few US respondents were proud of their country independent of the democratic situation, and respondents rarely mentioned problems with the question.

SPECIFIC PROBE FOR “SOCIAL SECURITY”

The coding scheme for the “social security” item:

A second coding scheme for the responses to the specific probe for “social security” included welfare benefits (e.g., housing benefits, food stamps), unemployment benefits (e.g., jobseeker allowances), health-related benefits (e.g., health insurance), or retirement benefits. Family benefits (e.g., maternity/paternity leave, child tax credits) and support for immigrants or refugees were also mentioned. Unexpectedly, Mexican respondents referred to the security situation in their country (e.g., violence, crime). Several respondents referred to all benefits or wrote ambiguous answers that mentioned agencies that deliver various benefits (e.g., Mexico: Instituto Mexicano del Seguro Social) without further explanation. All remaining answers were coded as other.

Probe results for the “social security” item:

Overall, German and British respondents mentioned a wider variety of social security benefits (e.g., unemployment benefits, family benefits) than US, Mexican, and Spanish respondents (see table 5). Most US respondents (68 percent) referred to retirement benefits, and 72 percent of Spanish respondents mentioned health-related benefits. Given the different types of welfare states in the five countries, that respondents associated different benefits with their social security system might not be surprising. Two critical points remain that provide indications of other influences that might affect the response behavior of Americans, Spaniards, and Mexicans.

Table 5.

Percentages of Respondents Mentioning Codes for the “Social Security” Item (substantive responses)

Code Germany (%) Great Britain (%) United States (%) Mexico (%) Spain (%)
Welfare benefits 28 19 7 4 0
Unemployment benefits 51 27 3 4 5
Health-related benefits 42 38 21 39 72
Retirement benefits 25 11 68 4 11
Family benefits 23 9 0 1 4
Support for immigrants and refugees 7 11 0 0 2
Security 0 0 0 39 0
All benefits 3 14 7 8 9
Other 19 12 3 6 5
Ambiguous answers 1 11 7 21 19
Nrespondents
75 64 71 96 86
Ncodings
148 97 82 121 111

Note.—Percentages refer to the number of respondents mentioning each code. Multiple responses were possible.

First, the range of perceived benefits varies across countries because of the translation of the term “social security benefits” (see the appendix for all language versions). For example, the US system offers more than retirement benefits (e.g., health: Medicaid; welfare: public housing). The closed item, which was also used in the ISSP study, asked, “How proud are you of America with regard to its social security system?” In the United States, the agency responsible for retirement benefits is called the Social Security Administration, and social security taxes contribute to the retirement system. In many probe responses that were coded as retirement benefits, US respondents made no distinction between social security benefits and retirement benefits:

“Social security administration. Money Americans are supposed to receive when they retire.”

“Social security for seniors.”

The Spanish system also provides several social security benefits, but many respondents were thinking only about healthcare. For some respondents, “seguridad social,” the Spanish translation of “social security benefits,” is seemingly the equivalent of the healthcare system in Spain:

“The basic benefits. There are not enough doctors … in the hospitals. A lot of specialists and drugs are not covered by the social security system [la seguridad social].”

These translations threaten comparability because in the German version the term “social security benefits” was translated as “sozialstaatliche Leistungen,” which can refer to any social security benefit (e.g., unemployment benefits and family benefits). Therefore, the German respondents were answering a question that had a broader lexical scope than the American and Spanish versions.

The varying range of perceived social security benefits affects the item’s cross-national comparability. As the MGCFA results have revealed, a cross-national comparison of the latent means of nationalism and constructive patriotism is impossible. To achieve partial scalar invariance, the intercepts of each item that measured constructive patriotism were freed. Although partial scalar invariance was unachievable, freeing the intercepts for the “social security” item would have yielded the greatest model improvement, as the ΔCFI values were smaller than those in the solution when the intercept was freed for the “fair and equal” and “democracy” items (“social security”: ΔCFI: –.112; “democracy”: –.128; “fair and equal”: –.210); therefore, the “social security” item is potentially the most problematic item in a cross-national comparison. The varying lexical scope might partially explain this outcome.

The second issue regarding the “social security” item is the high proportion of Mexican respondents (39 percent) who mentioned the security situation in Mexico instead of its social security system. Mexican probe responses indicate that the respondents were thinking about violence and crime. The second respondent even noted that he was uncertain about the intended meaning of the question:

“Is there such a thing? Crime, drug trafficking, attacks, robberies, police brutality, etc. Social security? That’s a myth.”

“Which social security? If we speak about crimes, it gets worse each day, and if we speak about health insurance, the service is bad, inefficient; they make fun of us.”

Given the problematic security situation in Mexico, these respondents might be more inclined than respondents from other countries to perceive the increased violence and criminality of their surroundings. For example, between 2006 and 2011, Mexico registered 47,515 crime-related deaths (Gonzalez 2012). A substantive part of the Mexican respondents misunderstood the intended meaning of the question, which again reduces this item’s cross-national comparability. Interestingly, the nonresponse rate of the closed item was particularly low for Mexican respondents. These responses are a classic example of “silent misinterpretation,” where respondents are unaware that they have misunderstood the item’s intended meaning (DeMaio and Rothgeb 1996).

Silent misinterpretation can also explain one of the MGCFA findings. In the Mexican single-country analysis, Jrule suggested allowing for an error correlation between the “democracy” and “social security” items. As we have seen, probe responses for the “democracy” item revealed that some Mexicans worry about the security situation in their country (code actions of authorities: negative evaluation). On the “social security” item, 39 percent of the Mexican respondents also expressed their concern about the security situation in Mexico. In addition to the constructive patriotism factor, the concern about the general security situation is a factor that influences the variance between both items, which might explain the correlation between the error terms for the “democracy” and “social security system” items.

The probe results for the “social security” item elucidate several problematic issues for cross-national comparability. The varying lexical scope of the term “social security system” and its silent misinterpretation by several Mexican respondents can potentially explain why the scalar measurement invariance tests failed.

SPECIFIC PROBE FOR “FAIR AND EQUAL”

Finally, following the “fair and equal” item was a specific probe about the social group that the respondents had in mind. Although previous articles have considered the vague formulation of the term “social groups in society” problematic (Fleiß, Höllinger, and Kuzmics 2009; Latcheva 2011), the “fair and equal” item was the least problematic indicator for the measurement invariance tests. Freeing the item’s intercept would have yielded the smallest model improvement when testing for partial scalar invariance (“social security”: ΔCFI: –.112; “democracy”: –.128; “fair and equal”: –210). Because social realities vary in the five countries, respondents may adopt different perspectives when they answer this item. Most importantly, the answer should reflect the respondent’s stance on constructive patriotism, particularly the value of equality.

The coding scheme for the “fair and equal” item:

The coding scheme distinguishes between foreigners in an abstract sense (e.g., “immigrants”), specific nationalities (e.g., “Indian”), and specific races or ethnicities (e.g., “Hispanics”). The respondents also mentioned the vertical division in society (e.g., “the rich, the poor”), religion (e.g., “Muslims”), gender, and sexual orientation (e.g., “homosexuals”). Some participants thought about ill people and older citizens. Al l these groups are potential targets of mistreatment. However, respondents also switched perspectives and referred to groups that are perceived as having a detrimental impact on society (e.g., politicians, bankers). Some respondents thought about the majority group. This code is a potential indicator that the item might reflect nationalistic instead of patriotic attitudes when respondents display in-group favoritism, for example, when they complain about an overabundance of help for foreigners and fear the reduction of the majority group’s benefits. A high proportion of US or Mexican respondents mentioning this code would confirm the CFA finding of an additional cross-loading of this item on the nationalism factor that Jrule suggested as appropriate. Several respondents referred to either all groups or no specific group. All remaining answers were coded as other.

Probe results for the “fair and equal” item:

In all countries, the respondents think about many different groups (see table 6). However, the most frequently considered groups vary by country. For example, German respondents more frequently mentioned foreigners, whereas US respondents thought more frequently about different races or ethnicities. Mexican and Spanish respondents were particularly concerned about their countries’ vertical division (e.g., the poor versus the rich).

Table 6.

Percentages of Respondents Mentioning Codes for the “Fair and Equal” Item (substantive responses)

Code Germany (%) Great Britain (%) United States (%) Mexico (%) Spain (%)
Foreigners 37 18 2 0 29
Specific nationalities 8 12 11 0 12
Race and ethnicity 6 34 56 37 11
Vertical division: Top – bottom 56 26 14 71 60
Religion 9 27 14 1 6
Gender 11 13 9 13 25
Sexual orientation 22 20 18 6 11
Ill people 17 26 3 5 3
Older citizens 20 9 0 8 3
Groups exerting a detrimental impact on society 12 6 6 18 35
Majority 3 6 1 1 12
All groups 6 9 11 10 4
No specific group 1 3 13 0 1
Other 27 19 13 29 21
Nrespondents
89 94 87 84 99
Ncodings
210 213 150 167 232

Note.—Percentages refer to the number of respondents mentioning each code. Multiple responses were possible.

Some respondents switched their perspectives when they answered the question about those responsible for unfair and unequal treatment (e.g., politicians, bankers), particularly in Spain (35 percent). However, the respondents who mentioned this category were still concerned about democratic values; therefore, this change in perspective does not diminish the item’s utility as an indicator of constructive patriotism.

Moreover, some respondents referred to the majority group in their country. This code could reflect nationalistic rather than patriotic values. Fortunately, this code was rarely mentioned. Interestingly, only one US and one Mexican respondent referred to this category, which contradicts the results of the single-country analysis (Jrule recommendation: cross-loading on the nationalism factor). In addition, there were no indications of respondents’ nationalistic attitudes at the category selection probe used for this item.9 The OP results confirm the decision to refrain from a respecification of the US and Mexican model. The decision to modify a model should never be driven purely by reliance on MIs and EPCs; it should be guided by substantive theory instead (Brown 2014). OP can also help discern between cases in which a model specification would have been appropriate (Mexico: error correlation) and those in which it would have been inappropriate (United States and Mexico: cross-loading). Spaniards more frequently referred to the majority (12 percent) than the other nationalities. However, most respondents in this category mentioned the majority group when complaining about those responsible for the poor economic situation in their country:

“About politicians and bankers, it does not seem that they care about the judiciary—about the majority of society, who has to deal with eviction notices.”

Respondents from all countries adopted multiple perspectives when answering the “fair and equal” item, but these differences reflect the countries’ complex social realities. The OP results reassure that the item is a good indicator of egalitarian attitudes, which was also reflected in the tests for partial scalar variance. However, the OP results did not support the suggested model modification in the CFA single-country analysis for the United States and Mexico.

Discussion

EVALUATION OF RESEARCH QUESTIONS

This study sought to compare MGCFA and OP. Using the example of constructive patriotism, this study wanted to test whether these methods arrive at similar conclusions. The MGCFA could confirm metric invariance, but (partial) scalar invariance tests failed, allowing for a comparison of structural relationships but not latent means. The partial scalar invariance tests and parameter values in Jrule detected the “social security” item as the most problematic item for cross-national comparison. GOF indices and the MIs and EPCs in Jrule suggested model modifications in the single-country analysis. For the United States and Mexico, Jrule recommended letting the “fair and equal” item also load onto the nationalism factor. For Mexico, Jrule advised allowing for an error correlation between the “democracy” and “social security” items.

Indeed, the OP results did partly clarify the MGCFA results. In particular, the probe for the “social security” item uncovered several problematic issues that were mirrored in MGCFA findings. First, OP explained that the suggested error correlation between the “democracy” and “social security” items in Mexico was driven by a silent misinterpretation of the term “social security system” as referring to “security” instead of “social security.” The Mexican security situation was also an issue for the “democracy” item. Therefore, concerns about general security affected part of the correlation between the two items in Mexico. Second, MGCFA indicated that freeing the intercepts of the “social security” item would yield the largest model improvements. OP revealed that the range of perceived benefits varied across countries. The different translations triggered either references to specific benefits, such as “retirement” in the United States and “health” in Spain, or to a wide variety of possible benefits (Germany). These country-specific components explain why the test for scalar invariance failed and why means should not be compared across countries.

By contrast, OP could not confirm the initial speculation on the “fair and equal” item. Jrule also suggested letting this item load onto the nationalism factor for the United States and Mexico. Neither the specific probe nor the category selection probe was able to validate this suggestion. OP is a handy tool for evaluating the appropriateness of such ad hoc hypotheses and a guide for model modifications.

LIMITATIONS OF OP AND AVENUES FOR FUTURE RESEARCH

Several issues for future research remain. First, data analysis in OP is rather work intensive because data need to be coded and sample sizes are large. Future research should investigate the potential of the (semi-) automated coding (Schonlau and Cooper 2016) of probes, which may facilitate data analysis. In a similar vein, future research could also investigate optimal sample sizes for coding and evaluate whether a sample of 100 respondents per country or more is necessary to achieve a saturation of results. Second, due to the increased response burden of open-ended answers, OP is potentially affected by probe nonresponse. Current research has already developed a tool that detects and reduces probe nonresponse by automatically triggering a follow-up probe alongside a motivational statement (Kaczmirek, Meitinger, and Behr 2015). Finally, the study was conducted with a nonrandom online access panel because cross-national probability-based web panels did not exist at the time of data collection. Fortunately, several cross-national probability-based web panels are being established (e.g., in the European Social Survey, Synergies for Europe’s Research Infrastructures in the Social Sciences [SERISS] 2015). Implementing OP in the ESS or other probability-based web panels could improve the generalizability of OP results.

OPTIMAL APPLICATIONS

The previous results demonstrate that much can be learned through a combined approach of MGCFA and OP. Depending on the research situation, two strategies for combining these methods are proposed. In an exploratory research situation, developing new items is necessary. OP could be implemented in the pretest stage to guide the cross-national item development. As OP is limited to a small number of countries, the developed items could be tested in a second step with a larger number of countries using MGCFA. By contrast, in a research situation in which more established survey items are used with a large number of countries, first conducting the different measurement invariance tests with MGCFA might be advisable. If some of the tests detect noninvariance, MIs and EPCs could help find the items and countries that should be used in an OP study. The probes can elucidate the reasons for the lacking cross-national comparability of these items or countries.

Supplementary Data

Supplementary data are freely available at Public Opinion Quarterly online.

Supplementary Material

poq_15_0227_File003

Appendix

Items measuring nationalism and constructive patriotism in ISSP 2013:

Nationalism

V19 (“More like us”): “The world would be a better place if people from other countries were more like the [COUNTRY NATIONALITY].”

V20 (“Better country”): “Generally speaking, [COUNTRY] is a better country than most other countries.”

Constructive Patriotism

V25 (“Democracy”): “How proud are you of [COUNTRY] with regard to the way democracy works?”

V28 (“Social security”): “How proud are you of [COUNTRY] with regard to its social security system?”

V34 (“Fair & equal”): “How proud are you of [COUNTRY] with regard to its fair and equal treatment of all groups in society?”

Translations of the “social security” item (as used in 2013 ISSP):

Great Britain & the United States: “How proud are you of [Britain/America] with regard to its social security system?”

Germany: “Wie stolz sind Sie auf Deutschland hinsichtlich der sozialstaatlichen Leistungen?”

Mexico: “¿Qué tan orgulloso/a está Ud. de México con respecto a su sistema de seguridad social?”

Spain: “¿Hasta qué punto está Ud. orgulloso/a de España en lo que se refiere a su sistema de seguridad social?”

1

Previous studies have often used the chi-square difference test for this purpose. Due to its sensitivity to large sample sizes, ΔCFI and ΔRMSEA are preferable to assess model fit (Cheung and Rensvold 2002; Davidov 2011).

2

Jrule is a software procedure for the automatic detection of misspecifications in structural equation models, taking the power, fit indices, and selection criteria into account.

3

The online section contains a table comparing the means, standard deviations, and nonresponse rates of the constructive patriotism items for the 2013 ISSP and the web survey (see table OS1).

4

Unfortunately, this split condition prevented conducting the measurement invariance tests with our web survey. Different respondents answered the nationalism items than the constructive patriotism items.

5

Screenshots of the three different probes and how they were implemented in the web survey can be found in the online section (figures OS1–OS3).

6

The results of these logistic regressions are available from the author upon request.

7

The calculation of response length, number of themes, and average response time controlled for outliers by excluding responses that exceeded values above two standard deviations from the mean value.

8

The FIML estimator requires continuous variables. However, data derived from Likert scales are ordinal. Davidov et al. (2011) demonstrated that using Likert scales for measurement invariance tests with MGCFA is justifiable. The model was reestimated with the weighted least squares means- and variance-adjusted (WLSMV) approach, an estimator for ordinal data (Flora and Curran 2004). Both estimators arrive at similar conclusions. The online section contains the WLSMV results (tables OS3–OS5).

9

Results and coding schema are available from the author upon request.

References

  1. Baviskar Siddhartha, Fran Malone Mary. 2004. “What Democracy Means to Citizens—And Why It Matters.” Revista Europea de Estudios Latinoamericanos y del Caribe (European Review of Latin American & Caribbean Studies) 76:3–23. [Google Scholar]
  2. Beatty Paul, Willis Gordon. 2007. “Research Synthesis: The Practice of Cognitive Interviewing.” Public Opinion Quarterly 71:287–311. [Google Scholar]
  3. Behr Dorothée. 2015. “Translating Answers to Open-Ended Survey Questions in Cross-Cultural Research: A Case Study on the Interplay between Translation, Coding, and Analysis.” Field Methods 27:284–99. [Google Scholar]
  4. Behr Dorothée, Bandilla Wolfgang, Kaczmirek Lars, Braun Michael. 2014. “Cognitive Probes in Web Surveys: on the Effect of Different Text Box Size and Probing Exposure on Response Quality.” Social Science Computer Review 32:524–33. [Google Scholar]
  5. Behr Dorothée, Braun Michael. 2015. “Satisfaction with the Way Democracy Works: How Respondents across Countries Understand the Question.” In Hopes and Anxieties: Six Waves of the European Social Survey, edited by Paweł Sztabinski, Henryk Domanski, Franciszek Sztabinski, 121–38. Frankfurt am Main: Peter Lang. [Google Scholar]
  6. Bentler Peter. 1990. “Comparative Fit Indexes in Structural Models.” Psychological Bulletin 107:238–46. [DOI] [PubMed] [Google Scholar]
  7. Blank Thomas, Schmidt Peter. 2003. “National Identity in a United Germany: Nationalism or Patriotism? An Empirical Test with Representative Data.” Political Psychology 24:289–311. [Google Scholar]
  8. Braun Michael, Behr Dorothée, Kaczmirek Lars, Bandilla Wolfgang. 2014. “Evaluating Cross-National Item Equivalence with Probing Questions in Web Surveys.” In Improving Survey Methods: Lessons from Recent Research, edited by Uwe Engel, Ben Jann, Peter Lynn, Annette Scherpenzeel, Patrick Sturgis, 184–200. New York: Routledge. [Google Scholar]
  9. Brown Timothy. 2014. Confirmatory Factor Analysis for Applied Research. New York: Guilford Publications. [Google Scholar]
  10. Browne Michael, Cudeck Robert. 1992. “Alternative Ways of Assessing Model Fit.” Sociological Methods & Research 21:230–58. [Google Scholar]
  11. Byrne Barbara, van de Vijver Fons. 2010. “Testing for Measurement and Structural Equivalence in Large-Scale Cross-Cultural Studies: Addressing the Issue of Nonequivalence.” International Journal of Testing 10:107–32. [Google Scholar]
  12. Byrne Barbara, Shavelson Richard, Muthén Bengt. 1989. “Testing for the Equivalence of Factor Covariance and Mean Structures: The Issue of Partial Measurement Invariance.” Psychological Bulletin 105:456–66. [Google Scholar]
  13. Canache Damary, Mondak Jeffery, Seligson Mitchell. 2001. “Meaning and Measurement in Cross-National Research on Satisfaction with Democracy.” Public Opinion Quarterly 65:506–28. [Google Scholar]
  14. Chen Fang Fang. 2007. “Sensitivity of Goodness of Fit Indexes to Lack of Measurement Invariance.” Structural Equation Modeling 14:464–504. [Google Scholar]
  15. Cheung Gordon, Rensvold Roger. 2002. “Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance.” Structural Equation Modeling 9:233–55. [Google Scholar]
  16. Converse Jean, Presser Stanley. 1986. Survey Questions: Handcrafting the Standardized Questionnaire. Beverly Hills, CA: Solid Action on Globalization and Environment. [Google Scholar]
  17. Davidov Eldad. 2009. “Measurement Equivalence of Nationalism and Constructive Patriotism in the ISSP: 34 Countries in a Comparative Perspective.” Political Analysis 17:64–82. [Google Scholar]
  18. –––. 2011. “Nationalism and Constructive Patriotism: A Longitudinal Test of Comparability in 22 Countries with the ISSP.” International Journal of Public Opinion Research 23:88–103. [Google Scholar]
  19. Davidov Eldad, Datler Georg, Schmidt Peter, Schwartz Shalom. 2011. “Testing the Invariance of Values in the Benelux Countries with the European Social Survey: Accounting for Ordinality.” In Cross-Cultural Analysis: Methods and Applications, edited by Eldad Davidov, Peter Schmidt, Jaak Billiet, 149–68. New York: Routledge. [Google Scholar]
  20. Davidov Eldad, Dülmer Hermann, Schlüter Elmar, Schmidt Peter, Meuleman Bart. 2012. “Using a Multilevel Structural Equation Modeling Approach to Explain Cross-Cultural Measurement Noninvariance.” Journal of Cross-Cultural Psychology 43:558–75. [Google Scholar]
  21. Davidov Eldad, Meuleman Bart, Cieciuch Jan, Schmidt Peter, Billiet Jaak. 2014. “Measurement Equivalence in Cross-National Research.” Annual Review of Sociology 40:55–75. [Google Scholar]
  22. DeMaio Theresa, Rothgeb Jennifer. 1996. “Cognitive Interviewing Techniques: in the Lab and in the Field.” In Answering Questions: Methodology for Determining Cognitive and Communicative Processes in Survey Research, edited by Norbert Schwarz and Seymour Sudman, 177–95. San Francisco: Jossey-Bass. [Google Scholar]
  23. Fleiß Jürgen, Höllinger Franz, Kuzmics Helmut. 2009. “Nationalstolz zwischen Patriotismus und Nationalismus?” Berliner Journal für Soziologie 19:409–34. [Google Scholar]
  24. Flora David, Curran Patrick. 2004. “An Empirical Evaluation of Alternative Methods of Estimation for Confirmatory Factor Analysis with Ordinal Data.” Psychological Methods 9:466–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Freedom House. . 2015. Freedom in the World 2014: The Annual Survey of Political Rights and Civil Liberties, edited by Arch Puddington, Aili Piano, Jennifer Dunham, Bret Nelson, Tyler Roylance. New York: Rowman & Littlefield Publishing Group. [Google Scholar]
  26. Gonzalez Francisco. 2012. Freedom House—Countries at the Crossroads 2012: Mexico. Washington, DC: Freedom House; Available at https://freedomhouse.org/sites/default/files/Mexico - FINAL.pdf Retrieved May 20, 2015. [Google Scholar]
  27. Gray Michelle, Blake Margaret. 2015. “Cross-National, Cross-Cultural and Multilingual Cognitive Interviewing.” In Cognitive Interviewing Practice, edited by Debbie Collins, 220–42. London: Sage Publications. [Google Scholar]
  28. Horn John, McArdle Jack. 1992. “A Practical and Theoretical Guide to Measurement Invariance in Aging Research.” Experimental Aging Research 18:117–44. [DOI] [PubMed] [Google Scholar]
  29. Hu Li-tze, Bentler Peter. 1999. “Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria versus New Alternatives.” Structural Equation Modeling 6:1–55. [Google Scholar]
  30. ISSP Research Group. . 2015. “International Social Survey Programme: National Identity III—ISSP 2013.” GESIS Data Archive, Cologne. ZA5950 Data file Version 1.0.0. [Google Scholar]
  31. Jöreskog Karl. 1971. “Simultaneous Factor Analysis in Several Populations.” Psychometrika 36:409–26. [Google Scholar]
  32. Kaczmirek Lars, Meitinger Katharina, Behr Dorothée. 2015. “Item Nonresponse in Open-Ended Questions: Identification and Reduction in Web Surveys.” Paper presented at the Sixth Conference of the European Survey Research Association (ESRA), Reykjavik, Iceland. [Google Scholar]
  33. Latcheva Rossalina. 2011. “Cognitive Interviewing and Factor-Analytic Techniques: A Mixed Method Approach to Validity of Survey Items Measuring National Identity.” Quality & Quantity 45:1175–99. [Google Scholar]
  34. Meitinger Katharina, Behr Dorothée. 2016. “Comparing Cognitive Interviewing and Online Probing: Do They Find Similar Results?” Field Methods. Online First. [Google Scholar]
  35. Meredith William. 1993. “Measurement Invariance, Factor Analysis and Factorial Invariance.” Psychometrika 58:525–43. [Google Scholar]
  36. Meuleman Bart, Billiet Jaak. 2009. “A Monte Carlo Sample Size Study: How Many Countries Are Needed for Accurate Multilevel SEM?” Survey Research Methods 3:45–58. [Google Scholar]
  37. Miller Krisen, Mont Daniel, Maitland Aaron, Altman Barbara, Madans Jennifer. 2011. “Results of a Cross-National Structured Cognitive Interviewing Protocol to Test Measures of Disability.” Quality & Quantity 45:801–15. [Google Scholar]
  38. Muthén Linda, Muthén Bengt. 2015. MPLUS 7.31. Los Angeles: Muthén & Muthén [Google Scholar]
  39. Oberski Daniel. 2014. Jrule for Mplus: A Program for Post-Hoc Power Evaluation of S tructural Equation Models Estimated by Mplus. Zenodo. [Google Scholar]
  40. Prüfer Peter, Rexroth Margrit. 2005. Kognitive Interviews. ZUMA How-to-Reihe 15. Retrieved October 20, 2015. [Google Scholar]
  41. Rock Donald, Werts Charles, Flaugher Ronald. 1978. “The Use of Analysis of C ovariance Structures for Comparing the Psychometric Properties of Multiple Variables across Populations.” Multivariate Behavioral Research 13:403–18. [DOI] [PubMed] [Google Scholar]
  42. Saris Willem, Satorra Albert, Van der Veld William. 2009. “Testing Structural Equation Models or Detection of Misspecifications?” Structural Equation Modeling 16:561–82. [Google Scholar]
  43. Schonlau Matthias, Couper Mick. 2016. “Semi-Automated Categorization of Open-Ended Questions.” Survey Research Methods 10:143–52. [Google Scholar]
  44. Schuman Howard. 1966. “The Random Probe: A Technique for Evaluating the Validity of Closed Questions.” American Sociological Review 31:218–22. [PubMed] [Google Scholar]
  45. SERISS. . 2015. “WP7: A Survey Future Online” London: European Social Survey; Available at http://seriss.eu/about-seriss/work-packages/wp7-a-survey-future-online/ Retrieved November 20, 2015. [Google Scholar]
  46. Steenkamp Jan-Benedict, Baumgartner Hans. 1998. “Assessing Measurement Invariance in Cross-National Consumer Research.” Journal of Consumer Research 25:78–107. [Google Scholar]
  47. Van de Vijver Fons. 2011. “Capturing Bias in Structural Equation Modeling.” In Cross- Cultural Analysis: Methods and Applications, edited by Eldad Davidov, Peter Schmidt, Jaak Billiet, 3–34. New York: Routledge. [Google Scholar]
  48. Vandenberg Robert, Lance Charles. 2000. “A Review and Synthesis of the Measurement Invariance Literature: Suggestions, Practices, and Recommendations for Organizational Research.” Organizational Research Methods 3:4–70. [Google Scholar]
  49. Willis Gordon. 2005. Cognitive Interviewing: A Tool for Improving Questionnaire Design. Thousand Oaks, CA: Sage. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

poq_15_0227_File003

Articles from Public Opinion Quarterly are provided here courtesy of Oxford University Press

RESOURCES