Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Feb 27.
Published in final edited form as: Qual Quant. 2023 Mar 29;58(1):471–495. doi: 10.1007/s11135-023-01650-7

An Ounce of Prevention: Using Conversational Interviewing and Avoiding Agreement Response Scales to Prevent Acquiescence

Rachel Davis 1,*, Frederick G Conrad 2, Shaohua Dong 2, Anna Mesa 1, Sunghee Lee 2, Timothy P Johnson 3
PMCID: PMC12945380  NIHMSID: NIHMS2145836  PMID: 41769680

Abstract

Acquiescent response style (ARS), the tendency for survey respondents to agree with survey items, is of particular concern for increasing measurement error in surveys with populations who are more likely to acquiesce, such as Latino respondents in the U.S. In order to develop methods for reducing ARS, this study addressed two questions: (1) Does administering a questionnaire using conversational interviewing (CI) yield less ARS than standardized interviewing (SI)? (2) Do bipolar disagree/agree (DA) response scales lead to higher ARS than unipolar response scales that do not assess agreement (non-AG)? A total of 891 Latino telephone survey respondents were screened for ARS and randomly assigned to four experimental groups determined by crossing interviewing technique (CI or SI) and response format (non-AG or DA): (1) SI/non-AG (n=301); (2) SI/DA (n=295); (3) CI/non-AG (n=149); and (4) CI/DA (n=146). CI yielded lower ARS than SI (p<0.001), but there was no difference in ARS between DA and non-AG response scales. A subset of coded interview recordings indicated that the CI interviewers reduced ARS by clarifying questions even in the absence of evidence of respondent confusion and helping with response mapping. These results suggest that difficulty answering questions associated with cognitive decline and cultural norms may have prompted higher use of ARS, but that conversational interviewers were able to mitigate these difficulties and cultural tendencies. Findings from this study suggest that using CI to administer survey questions may decrease ARS and improve data quality among survey respondents who are more likely to engage in ARS.

1. Introduction

Acquiescent response style (ARS) refers to the tendency for survey respondents to agree with survey items, irrespective of question content (Baumgartner and Steenkamp 2001; Krosnick 1991). ARS is most commonly observed for questions measuring opinions and other subjective phenomena using response options that are clearly positive or negative, such as “yes/no,” “true/false,” and bipolar response scales (e.g., “Strongly Agree” to “Strongly Disagree;” Saris et al. 2010). There are several potential reasons why respondents engage in ARS. First, respondents may feel that positive responses are more polite, minimize offense, and enable them to present themselves more favorably than negative responses. These tendencies may consciously or unconsciously lead to a pattern of selecting acquiescent responses representing a positively skewed form of socially desirable responding. Second, it may sometimes be easier to select an option arbitrarily than to carefully think through one’s answer. In this situation, selecting a positive answer may be a less conspicuous indicator of reduced effort than selecting a negative response, since disagreement may be more unexpected due to social norms for politeness, thereby allowing respondents to reduce their effort without calling attention to the reduced effort. Such a shortcut may be a form of the more general effort-reduction phenomenon known as survey satisficing (e.g., Krosnick 1991; Roberts et al. 2019). Thus, while stemming from a different root cause, this pattern of responding may also appear similar to the positively skewed form of socially desirable responding described above. Third, ARS may be a means of coping with difficulty in responding to survey items, regardless of the source of difficulty. For example, a respondent might be confused by the statement “When I am hungry, it’s hard to avoid fast food” because they always find it difficult to avoid fast foods, not just when they are hungry, and so might be unsure what it means to agree or disagree with the statement. Rather than asking for help, a respondent might pick a positive option such as “Strongly Agree” because they assume that this type of response will be less controversial than “Strongly Disagree” and, thus, less likely to reveal their potentially embarrassing confusion.

ARS appears to be associated with several sociodemographic characteristics. For example, ARS appears to be inversely associated with education (Davis et al. 2019; Meisenberg and Williams 2008; Narayan and Krosnick 1996; Weijters et al. 2010). As respondents with limited education may be less certain about their answers, they may be more susceptible to strategies that reduce their effort and the appearance of ambivalence (Rammstedt et al. 2017). ARS has also been positively associated with age (Davis et al. 2019; De Beuckelaer et al. 2010; Meisenberg and Williams 2008; Rammstedt et al. 2017; Ross and Mirowsky 1984; Weijters et al. 2010). Since aging tends to reduce cognitive ability (Salthouse 1996, 1999), respondents with reduced ability may consciously or unconsciously provide positive answers to camouflage difficulty processing items. Within the U.S., Latino populations tend to acquiesce more than non-Latino whites (Aday et al. 1980; Davis et al. 2019; Liu et al. 2017; Marín et al. 1992; Ross and Mirowsky 1984; Warnecke et al. 1997), potentially due to Latino cultural communication norms (Davis et al. 2019; Davis et al. 2011).

Although it is sometimes possible to control for ARS-induced error after data collection through data analysis (for summaries of methods for adjusting for ARS during analysis, see Lee et al. 2022 and Van Vaerenbergh and Thomas 2013), it is preferable to reduce the occurrence of ARS during data collection, as this should produce more valid raw responses and reduce the need for statistical adjustments. We report the results of a 2×2 experiment conducted with acquiescent Latino telephone survey respondents to address two research questions: (1) Does administering a questionnaire using conversational interviewing (CI), in which interviewers are trained to help respondents interpret questions and map their answers to response scales, mitigate ARS during data collection compared to standardized interviewing (SI)? (2) Does the use of unipolar response scales that do not assess agreement (non-AG) reduce ARS when compared to bipolar, disagree/agree (DA) scales?

1.1. Conversational vs. Standardized Interviewing

In a traditional SI approach (Fowler and Mangione 1989), interviewers are required to read survey questions exactly as worded and, if a respondent provides something other than an acceptable answer, issue a “neutral” probe (e.g., “Let me repeat the question,” “Whatever it means to you”) to obtain an acceptable response. The primary goal of SI is to hold question wording constant across respondents so that different answers to the same question can be attributed to actual differences between respondents rather than to differences in question wording. By minimizing variation in how questions are administered, SI is assumed to reduce measurement error.

In contrast, CI requires interviewers to provide as-needed clarification to assist respondents in comprehending questions as they are intended to be understood. As in SI, conversational interviewers are required to first read questions exactly as worded; however, they are instructed to subsequently judge whether respondents need clarification and to say what they believe is necessary to convey the intended meaning (Conrad and Schober 2000, 2020; Schober and Conrad 1997). A critical assumption behind CI is that pretesting cannot eliminate all of the problems that may arise when respondents map questions to their unique circumstances. For example, Conrad and Schober (2000) demonstrated that asking about expenditures for “inside maintenance and home repair” required explaining to some respondents which costs should be included/excluded but that it would be highly inefficient to explain all criteria to all respondents, as most criteria would be irrelevant to most respondents’ circumstances. For these reasons, conversational interviewers are trained to provide help to respondents based on each individual’s needs. The wording that interviewers use to do this is allowed to vary across respondents. The CI approach therefore attempts to standardize respondents’ interpretation of each question rather than the question wording.

CI has been shown to increase response quality for both behavioral (Conrad and Schober 2000; Schober and Conrad 1997) and opinion (Hubbard et al. 2020) items, usually without increasing interviewer variance (West et al. 2018). CI research on opinion questions has focused more on question stem comprehension than the comprehension and use of response scales (Conrad and Schober 2021). However, it is possible that if interviewers are allowed to help respondents understand not only what is being asked of them, but also how to interpret and select the response options that best match their answers, CI may lessen ARS.

There are several ways in which CI could reduce ARS. For one, conversational interviewers have more opportunities to address the issues that lead to ARS than questionnaire designers, as global wording changes are unlikely to be as effective as interviewers’ individualized interventions with respondents. For example, if a respondent has misunderstood how to use a response scale, a conversational interviewer can provide a clearer explanation. While pretesting items for a survey with Mexican American adults in a previous unrelated study, we observed an exchange in which a respondent was asked to rate a statement on a scale ranging from 1 (“Not like me”) to 5 (“A lot like me”). The respondent initially answered, “that’s not like me.” When pressed for a numerical response, they answered 4 (a high degree of similarity), demonstrating confusion about the direction of the response scale. The interviewer – who was not trained in CI but was not obliged to follow SI practices – then provided assistance with response mapping:

Respondent: Okay, I’m not understanding the question.

Interviewer: I’m sorry.

Interviewer rereads the question.

Respondent: No, that’s not like me.

Interviewer: So, what number would we put?

Respondent: 4.

Interviewer: So, it’s not like you? One is “Not like me,” 5 would be “A lot like me.”

Respondent: Oh, see, I have it backwards. I’m sorry.

Interviewer: No, that’s alright.

Respondent: I’m doing 1 being highest. It’s the other way around.

Interviewer: Yeah, 5 would be “A lot like me,” and 1 “Not like me,” but you can always pick in between, depending.

Respondent: 2.

CI may also reduce ARS by helping interviewers to convey the importance of providing candid, thoughtful answers. For instance, in the pretest mentioned above, we asked participants why they thought Mexican Americans might be more likely to engage in ARS. One participant explained: “(W)e have been programmed that, ‘Well, I guess they just wanna hear the positive, and they don’t want to hear what we really have to say’ ... if you ain’t got nothing good to say, don’t say it ...” This comment suggests that if interviewers emphasize the importance of providing honest answers when, in their judgment, the respondent may not have been entirely candid, respondents may feel more comfortable choosing negative responses. For these reasons, we hypothesized that using CI to administer a questionnaire would yield less ARS than using SI.

1.2. Non-Agreement vs. Disagree/Agree Response Scales

It is possible that questionnaire designers can also reduce ARS during data collection by avoiding bipolar DA scales. Although bipolar DA scales are widely used in the social sciences, there are several reasons to believe they may increase ARS (Saris et al. 2010). For one, DA options explicitly instruct respondents to think about agreement when formulating their responses. Respondents may therefore feel that to disagree is impolite. This may motivate respondents to select responses from the agree end of the scale to convey deference, express politeness, or avoid embarrassment associated with problems understanding the question or producing an answer. Secondly, DA scales may be associated with increased cognitive burden. For example, as compared to item-specific response scales, in which response labels more closely match the underlying construct assessed in each item, DA scales require respondents to engage in an extra mental step of mapping the answers they produce onto an agreement dimension, which, in most cases, is not the dimension underlying the item. For instance, “When I am hungry, it’s hard to avoid fast food” has more to do with difficulty avoiding fast food than with agreement. Since item-specific response scales have been associated with better internal validity than DA scales (Saris et al. 2010), response scales that assess the same concepts queried in question stems may similarly mitigate ARS by reducing response mapping difficulty. Lastly, the traditional DA scale endpoint labels, “Strongly Disagree” and “Strongly Agree,” accentuate the opposite ends of a bipolar continuum that requires respondents to select a response from the negative or positive end of the scale. In contrast, response scales that assess dimensions other than agreement are often unipolar, so their values are not as clearly labeled as positive or negative. For example, the label “Believe this very much,” which indicates the presence of a belief, is not necessarily more positive than the label “Don’t believe this at all,” which indicates the absence of a belief. It is therefore plausible that unipolar response scales that do not assess agreement (non-AG scales) will elicit less ARS than bipolar DA scales. As such, we hypothesized that non-AG response scales would result in less ARS than DA scales.

2. Method

2.1. Respondents

Because our goal was to mitigate ARS within a population for whom this response style may have a more substantial effect on data quality, our experiment required a sufficient prevalence of ARS to detect reductions attributable to our experimental conditions. We therefore targeted a lower-income, limited-education, and less acculturated Latino population that we expected to be highly acquiescent (Davis et al. 2019; Liu et al. 2017; Nair et al. 2008; Ross and Mirowsky 1984; Weijters et al. 2010). We enrolled 913 participants in the study, which, after deleting 22 respondents due to missing data in key variables, yielded an analytic sample of 891. Almost half of the participants (n=425) were recruited because they had exhibited ARS in a telephone survey conducted several months earlier. The remaining participants (n=466) were newly recruited. Respondents in both groups were randomly selected from a commercial sample vendor list of landline and mobile telephone numbers associated with addresses in large Latino markets in the mainland U.S. or Puerto Rico, lower-income households ($25,000 or less), and individuals with limited education (12 years or less). This method has been shown to be efficient for locating participants with particular attributes (Pasek et al. 2014; Valliant et al. 2014; West et al. 2015). Respondent education and income were also directly measured in the survey.

Respondents were eligible to participate in the study if they were between the ages of 18 and 90, spoke English or Spanish, self-identified with one of three Latino heritage groups (Mexican American, Cuban American, Puerto Rican), and exhibited a threshold level of ARS, which was assessed using a 20-item ARS screener from a previous study with Latino adults (Lee et al., 2022). Respondents met the ARS threshold if they answered 6 or 7 on a scale ranging from 1=“Strongly Disagree” to 7=“Strongly Agree” to one or more items in both the positively and negatively worded item groups comprising two balanced scales: the Rosenberg Self-Esteem Scale (Rosenberg 1965) and the Perceived Stress Scale (Cohen et al. 1983). Interview language was determined during eligibility screening using three items adapted from the National Latino and Asian American Study (Alegria and Takeuchi 2002) and confirming language preference before administering the questionnaire. A $30 gift card was mailed to participants after completing the interview. All study procedures were reviewed by a university-affiliated Institutional Review Board.

2.2. Experimental Design

This study used a between-subjects design that fully crossed interviewing technique (CI vs. SI) with response scale format (unipolar non-AG vs. bipolar DA). Within each Latino subgroup, participants were randomly assigned to four experimental groups: (1) SI/non-AG response options (n=301); (2) SI/DA response options (n=295); (3) CI/non-AG response options (n=149); and (4) CI/DA response options (n=146). More respondents were assigned to SI (n=596) than CI (n=295) because the CI technique, as implemented here, involved additional interviewer training, more monitoring, and longer interview administration times (36–59 minutes for CI vs. 29–54 minutes for SI). As a result, the conversational interviews were more expensive than the standardized interviews (see Conrad and Schober 2021 for a discussion of the relative costs of SI and CI), leading to different sample sizes for the two techniques.

2.3. Interviewers and Interviewing Technique

Twenty-eight professional interviewers with SI experience administered the questionnaires: six interviewers were assigned to use CI, and 22 interviewers were assigned to use SI. More interviewers were assigned to use SI due to the larger number of standardized interviews. Four interviewers administered the majority of the conversational interviews (285 out of 295). The average number of interviews administered was 42.1 (SD=37.4) for CI and 24.8 (SD=38.3) for SI. To avoid contamination across experimental conditions, the two sets of interviewers were physically located in different call centers, and each group was unaware of the other protocol.

Each set of interviewers was trained to administer the DA and non-AG questionnaires using their assigned interviewing technique. Three-hour training sessions were conducted by the call center supervisors, as per standard procedures for the interviewing company, and two of the authors [FC, AM], who provided additional study-specific training, particularly for the CI methods, as this approach had not been previously used at the interviewing company. Both trainings were held in person, with the interviewers and their supervisors in the same room. Since the study-specific training was relatively brief for the SI condition and consisting primarily of reinforcing the need to stick to the SI methods that the call center supervisors covered in the training, the study authors participated in their part of the SI training by phone. Both trainings were conducted in English, with a bilingual instructor and supervisor providing Spanish explanations and translations as needed. The CI training focused on teaching interviewers the intended meanings of the questions, which generally centered on key concepts and question stem wording for which the researchers had developed definitions a priori. Each conversational interviewer was provided with a binder containing these definitions for all 132 questions, including the 27 core ARS items (see Appendix A). The CI training also taught interviewers how the researchers intended for respondents to use the response scales. The conversational interviewers were instructed to first read each question as worded and then say whatever they considered necessary, if anything, to ensure that respondents understood the intended question meanings and how to use the response scales. The training included role-playing exercises, with the instructors acting as respondents, until the instructors determined that the interviewers were competent in the CI techniques. Since the SI interviewers were already experienced with SI, the SI training focused on reinforcing behaviors that the interviewers presumably already practiced: reading questions exactly as worded, re-reading response options as needed, and offering neutral probes if respondents produced anything other than acceptable responses (e.g., “Let me repeat the question”). The SI interviewers were encouraged to implement strict SI procedures that were as close as possible to the idealized SI method recommended by Fowler and Mangione (1989). The concepts covered in the two interviewer trainings were reinforced during data collection by the call center supervisors, who regularly monitored interviewers’ calls as part of the company’s standard operating procedures.

2.4. Questionnaire

The experiment was embedded in a longer interview consisting of 132 items querying vegetable consumption, attitudes and beliefs related to vegetable consumption, and sociodemographics. The questionnaires were administered in English or Spanish and were otherwise identical across the four experimental conditions with two exceptions: (1) the DA/non-AG wording variations; and (2) the CI questionnaires included additional introductory wording that underscored the importance of asking the interviewers for help (Appendix A). Spanish translations were conducted by a professional translation company, with translations subsequently reviewed by bilingual members of the study team. Question order was randomized within question sets. Item wording is included in Appendix B. The items used to create the primary ARS variable (ARS-HI) were pretested via a telephone survey with 300 Latino respondents, while items used to create the secondary ARS variable (ARS-VRS) were pretested via 82 cognitive interviews. In order to obtain adequate representation among the target population, the telephone survey and cognitive interview samples were stratified by Latino heritage (Mexican American, Cuban American, Puerto Rican), language use (English/Spanish), and gender (male/female). A new sample was recruited for the present study.

2.4.1. ARS Heterogeneous Items (ARS-HI).

The primary ARS variable, which was used in most analyses, was constructed using responses to 27 questions regarding beliefs about unrelated topics that were purposefully selected to represent a heterogeneous group of items for measuring ARS. These items were intentionally selected or written to be difficult to interpret (e.g., “I trust social movements”), query fictional topics (e.g., “The passage of Lambert’s Law would dramatically reduce school shootings in the United States”), or be vague statements that sound socially desirable if one does not think too hard about them (e.g., “A wise person forgives but does not forget”), as these types of items have no actual correct answers and capture the range of potential drivers of ARS mentioned earlier. Each item was formatted as a statement (e.g., “The United States spends too much money on scientific research”). The DA scale ranged from 1=“Strongly Disagree” to 7=“Strongly Agree.” The non-AG scale ranged from 1=“Don’t Believe This At All” to 7=“Believe This Very Much.” The response scale labels did not vary across ARS-HI items, as all items assessed the same underlying dimension of belief. We calculated an ARS-HI score for each respondent following the method introduced by Baumgartner and Steenkamp (2001) by recoding negatively valenced responses (1, 2, 3, 4) to 0 and positively valenced responses (5, 6, 7) to 1, 2, and 3, respectively. The dependent variable in most analyses was the average of these recoded values across the 27 ARS-HI items. Although the frequency distribution of ARS-HI was slightly skewed toward positive responses, as was expected, a quantile-quantile (Q-Q) plot indicated that the ARS-HI variable had a sufficiently normal distribution for regression analyses. ARS-HI values ranged from 0 to 3, with larger values indicating greater ARS.

2.4.2. ARS Varied Response Scales (ARS-VRS).

Responses to 34 items assessing attitudes and beliefs related to vegetable intake were used to create a secondary ARS variable, ARS-VRS, which allowed us to replicate selected analyses for the ARS-HI variable. As shown in Appendix B, the majority of these items were drawn from previous health studies and adapted to fit the requirements for present experiment. No ARS-HI items were included in the ARS-VRS measure. The ARS-VRS DA items used the same response scale as the ARS-HI DA items. The ARS-VRS non-AG items used the following response scales: ten items concerning motivation (1=“Not Motivate You at All” to 7=“Motivate You a Lot”); ten items concerning likeliness (1=“Not at All Likely” to 7=“Very Likely”); eight items concerning confidence (1=“Not at All Confident” to 7=“Very Confident”); four items concerning frequency of occurrence (1=“Never” to 7=“Almost Always”); one item concerning encouragement (1=“They Don’t Encourage You at All” to 7=“They Encourage You a Lot”); and one item concerning teasing (1=“They Don’t Tease You at All” to 7=“They Tease You a Lot”). The ARS-VRS variable was created using the Baumgartner and Steenkamp (2001) recoding method described above, with scores ranging from 0 to 3 and larger values indicating higher ARS. Due to a lack of normality in the original ARS-VRS variable, a square transformation (X2) was applied, which normalized the distribution of the ARS-VRS variable for use in the regression analysis.

2.4.3. Sociodemographic variables.

Six sociodemographic variables were assessed: Latino heritage; age, which was grouped into quartiles to obtain a more uniform distribution (below 55, 56–67, 68–77, more than 77 years); gender; education; interview language; and Latino cultural orientation. Cultural orientation was measured using 16 items from an adapted version of the Mexican orientation subscale from the Acculturation Rating Scale for Mexican Americans-II (Cuellar et al. 1995), which queried respondents’ use of Spanish and their identification, affinity with the culture of, and interaction with their Latino heritage group (α=.79). Scores ranged from 0 to 4, with higher values indicating a stronger Latino cultural orientation. Since 20 respondents had missing values for gender, a missing category was included in order to avoid casewise deletion during the regression analyses.

2.5. Interviewer Behavior Coding

In the event that CI reduced ARS relative to SI, we sought to examine which interviewer behaviors might be responsible. Thus, we identified potentially relevant interviewer behaviors a priori and coded the presence or absence of these behaviors in audio recordings of the telephone interviews. Although all interviews were intended to be recorded, 31% of the recordings were lost1 by the company conducting the interviews, yielding a total of 616 interviews available for coding (n=223 CI; n=393 SI). For these 616 interviews, two bilingual research assistants2 coded relevant interviewer behaviors for each administration of the 27 items used to create the ARS-HI measure. As shown in Table 1, seven codes (S1-S7) were developed for SI behaviors and ten codes (C1-C10) were developed for CI behaviors. Multiple codes could be assigned to the same events, as many of the codes occurred simultaneously (e.g., S1 and S7). During the initial coding, C3 and C4 comprised a single, combined code, which we will refer to as C3/C4. After that coding was completed, it became apparent that there could be value in distinguishing between when a conversational interviewer provided help somehow other than providing a definition (C3) versus helping with response mapping (C4). Thus, a random subsample of 60 of the 223 conversational interviews (26.9% of the CI audio corpus) was recoded to differentiate C3 from C4. This exercise indicated that the vast majority (93%) of interactions tagged with the C3/C4 code involved interviewers assisting respondents with response mapping (C4). After this coding was completed, we constructed a variable for each code that represented its average frequency across all 27 of the ARS-HI items. In some analyses, C3 and C4 are treated as a single code (as originally coded); however, based on our subsequent coding, we interpreted C3/C4 as primarily indicating help with response mapping (C4).

Table 1.

Interview Behavior Codes

Code Description Example
I. Standardized Interviewing Behavior Codes
S1 (neutral probe) After evidence of respondent difficulty, interviewer administers neutral probe “Let me repeat/re-read the question” or “Could you be a little bit more specific?”
S2 (missing neutral probe) After evidence of respondent difficulty, interviewer does not administer a neutral probe Interviewer: “How confident are you that you can eat vegetables when there are foods in your house like chips, cookies, or candy?”
Respondent answers: “... (pause) ... Yes.”
Interviewer: “Okay ...” (*Reads next item*)
S3 (non-neutral assistance) After evidence of respondent difficulty, interviewer administers a non-neutral probe or definition Interviewer: “In some situations, it is more important to be compassionate than fair.”
Respondent: “What do you mean by compassionate?”
Interviewer defines key term in question, even though not instructed to do this: “I would say it means ‘being kind to other people.’”
S4 (incomplete probe) After evidence of respondent difficulty, interviewer paraphrases or administers an incomplete neutral probe Instead of repeating the entire question and response scale, an interviewer only repeats the response scale options, “One is Strongly Disagree, and seven is Strongly Agree.”
S5 (item misread) Interviewer incorrectly reads item, changing its meaning Interviewer says “French fries” instead of “chips”
S6 (incorrect mapping – proceeds) After evidence that respondent incorrectly mapped answer to or otherwise misunderstood the response scale, interviewer proceeds without probing to obtain a more accurate response Interviewer reads item using a 1=“Strongly Disagree” and 7=“Strongly Agree” response scale: “Global warming is a myth.”
Respondent: “Seven, it’s not a myth.”
Interviewer: “Okay...” (*Reads next item*)
S7 (incorrect mapping – neutral probe) After evidence that respondent incorrectly mapped answer to or otherwise misunderstood the response scale, interviewer provides neutral probe Interviewer reads item using a 1=“Strongly Disagree” and 7=“Strongly Agree” response scale: “Global warming is a myth.”
Respondent: “Seven, it’s not a myth.”
Interviewer: “Let me re-read the question, ‘Global warming is a myth.’”
II. Conversational Interviewing Behavior Codes
C1 (definition after difficulty) After evidence of respondent difficulty, interviewer provides a definition Interviewer: “In some situations, it is more important to be compassionate than fair.”
Respondent: “Eh, maybe? I don’t really understand what you’re asking.”
Interviewer: “This question is asking whether you believe that a person should always be fair to other people, no matter what, or whether there are sometimes situations in which it is more important to be caring, thoughtful, considerate, and sympathetic toward other people.
C2 (definition, no difficulty) Without evidence of respondent difficulty, interviewer provides a definition Interviewer: “In some situations, it is more important to be compassionate than fair, that is to what extent to you agree or disagree that a person should always be fair to other people, no matter what, or are there sometimes situations in which it is more important to be caring, thoughtful, considerate, and sympathetic toward other people?”
C3/C4 (non-definition assistance) (Combined variable representing C3 and C4, below) (See examples for C3 and C4, below)
C3 (assistance without explicit definition) After evidence of respondent difficulty, interviewer assists respondent without providing an explicit definition Interviewer reads item: “I am confident that I can prepare a meal that contains vegetables when I am very tired.”
Respondent: “Does microwaving a frozen meal with vegetables count?”
Interviewer: “Yes.”
C4 (mapping assistance) Without evidence of respondent difficulty, interviewer provides assistance with response mapping based on what the respondent says Interviewer reads item: “I am confident that I can eat vegetables when I am sad or in a bad mood.”
Respondent: “No, no. When I’m sad or in a bad mood, I don’t want vegetables.”
Interviewer: “Then it sounds like you are more inclined toward a lower number.”
Respondent: “Two.”
C5 (missing assistance) After evidence of respondent difficulty, interviewer does not provide a definition or other assistance Interviewer: “How confident are you that you can eat vegetables when there are foods in your house like chips, cookies, or candy?”
Respondent answers: “... (pause) ... Yes.”
Interviewer: “Okay ...” (*Reads next item*)
C6 (neutral probe) After evidence of respondent difficulty, interviewer administers neutral probe “Let me repeat/re-read the question” or “Could you be a little bit more specific?”
C7 (inaccurate definition) Interviewer provides an inaccurate definition or assistance by including content not included in the definition text or inaccurately paraphrasing definition Interviewer: “Vegetarianism is harmful to the environment.”
Respondent: “Definitely not true...I disagree...”
Interviewer: “Okay... so the number would be from the high end of the scale, that is, 5, 6, or 7, depending on how much you disagree ...” (The corresponding scale value is actually from the low end of the scale, i.e., 1, 2 or 3)
C8 (item misread) Interviewer incorrectly reads item, changing its meaning Interviewer says “French fries” instead of “chips”
C9 (incorrect mapping – proceeds) After evidence that respondent incorrectly mapped answer to or otherwise misunderstood the response scale, interviewer proceeds without probing or providing assistance to obtain a more accurate response Interviewer reads item using a 1=“Strongly Disagree” and 7=“Strongly Agree” response scale: “Global warming is a myth.”
Respondent: “Seven, it’s not a myth.”
Interviewer: “Okay...” (*Reads next item*)
C10 (incorrect mapping – clarification) After evidence that respondent incorrectly mapped answer to or otherwise misunderstood the response scale, interviewer clarified (or tried to clarify) response mappings Interviewer reads item using a 1=“Strongly Disagree” and 7=“Strongly Agree” response scale:
Interviewer: “Global warming is a myth.”
Respondent: “Mmm...No. It’s the truth.”
Interviewer: “One is you don’t believe it at all, and 7 is you believe it very much...” (Continues to provide assistance)

2.6. Data Analysis

We used SAS® (version 9.4, copyright © 2013, SAS Institute Inc., Cary, NC) and R (version 3.3.4, R Core Team, 2020) to prepare the data set, compute descriptive statistics, and estimate a series of hierarchical linear regression models to investigate the main effects of interviewing technique, response scale format, and sociodemographic variables on ARS while controlling for the clustering of respondents by interviewers. To test the hypotheses, we first estimated the main effects of response scale format and interviewing technique on ARS-HI (Model 1) and then added an interaction term to test the influence of interviewing technique on response scale format in ARS-HI (Model 2). Both models controlled for Latino heritage, age, education, gender, cultural orientation, and interview language. In order to ascertain whether the results of these models extended to measures of ARS based on responses to items with more variation in non-AG response option labeling, we re-estimated the two ARS-HI models using a hierarchical linear regression model with ARS-VRS as the outcome variable. We then estimated separate ordinary least squares linear regression models for each interviewing technique to investigate the influence of technique-specific interviewer behaviors on ARS. These analyses included the interviewer behavior codes for CI and SI while controlling for Latino heritage, age, education, cultural orientation, and interview language.

3. Results

3.1. Sample Characteristics

The sample had a mean age of 64.3 years (SD=15.5), was 80.1% female, and was comprised of similar numbers of Mexican American, Puerto Rican, and Cuban American respondents (Table 2). Most participants (93.1%) completed the survey in Spanish, and 74.7% of respondents had a high school-level education or less. The mean cultural orientation score (3.4 on a 5-point scale) indicated that respondents had a relatively strong Latino cultural orientation.

Table 2.

Sample Characteristics

Total (n=891) CI Only (n=295) SI Only (n=596)
Mean age in years (standard deviation [SD]) 64.3 (15.5) 65.7 (15.1) 63.7 (15.7)
Gender (%):
 Male 19.4 18.6 19.8
 Female 78.3 79.7 77.7
 Missing 2.3 1.7 2.5
Latino heritage group (%):
 Mexican American 34.6 34.2 34.7
 Puerto Rican 32.9 33.2 32.7
 Cuban American 32.6 32.5 32.6
Education (%):
 Less than 7th grade 26.8 28.5 26.0
 7th through 12th grade, no diploma 24.9 23.7 25.5
 High school graduate or equivalent 23.0 20.7 24.2
 Some college or technical/vocational school 12.4 11.9 12.6
 4-year college degree 9.1 10.5 8.4
 Graduate degree 3.8 4.8 3.4
Interview language (%):
 Spanish 93.1 94.2 92.6
 English 6.9 5.8 7.4
Mean Latino cultural orientation (SD) 3.4 (0.4) 3.4 (0.4) 3.4 (0.4)
Mean ARS-HI (SD) 1.8 (0.6) 1.6 (0.5) 1.9 (0.6)
Mean ARS-VRS (SD), before square transformation 2.0 (0.6) 1.9 (0.6) 2.1 (0.6)

3.2. Overall Use of ARS

As expected, the sample was highly acquiescent, which was a necessary precursor for our experiment. The mean ARS-HI score was 1.8 out of a maximum possible score of 3 (Table 2), indicating an average response slightly below 6 on a 7-point scale. The mean of the original ARS-VRS score, before square transformation, was 2.0, or an average of 6 on a 7-point scale. Thus, there was clearly a tendency for respondents to select options from the positive/upper ranges of the response scales.

3.3. Fidelity to the Experimental Interviewing Conditions

An examination of the mean frequencies of the interviewer behavior codes indicated that the interviewers adhered to their assigned interviewing techniques when administering the 27 questions used to measure ARS-HI (Table 3). On average, standardized interviewers provided full (S1) or partial (S4) neutral probes when administering 4% and 16% of these questions, respectively, and non-neutral probes or definitions for less than 1% of question administrations. These behaviors were consistent with the SI training. In contrast, conversational interviewers provided definitions to clarify question meaning both in response to confusion (C1) and without explicit evidence of confusion (C2) for an average of 12% and 34% of question administrations, respectively. The CI interviewers rarely (<1% of the time) failed to provide a definition when there was evidence it would likely have helped (C5), which suggests that they internalized the essential behaviors required for CI. However, the conversational interviewers were not able to entirely refrain from using the SI practices to which they were accustomed, as they administered neutral probes (C6) for an average of 28% of the questions. This use of SI methods did not appear to detract from the effectiveness of CI. On average, the conversational interviewers helped respondents map their answers to the response scale (C3/C4) or corrected their understanding of the response scale (C10) for about 29% and 3% of question administrations, respectively, and rarely (<1% of the time) failed to help respondents with response mapping when assistance was needed (C9). Thus, the conversational interviewers effectively used CI techniques to assist respondents with question clarification and response scale mapping. Overall, both groups of interviewers appear to have faithfully implemented their assigned techniques.

Table 3.

Mean Frequencies1 of Coded Interviewer Behaviors During CI (n=223) and SI (n=393) Interviews

CI Codes Mean (SD) SI Codes Mean (SD)
C1 (definition after difficulty) 0.12 (0.12) S1 (neutral probe) 0.04 (0.08)
C2 (definition, no difficulty) 0.34 (0.26) S2 (missing neutral probe) 0.006 (0.05)
C3/C4 (non-definition assistance) 0.29 (0.33) S3 (non-neutral assistance) 0.005 (0.02)
C5 (missing assistance) 0.005 (0.02) S4 (incomplete probe) 0.16 (0.20)
C6 (neutral probe) 0.28 (0.25) S5 (item misread) 0.002 (0.01)
C7 (inaccurate definition) 0.006 (0.02) S6 (incorrect mapping – proceeds) 0.007 (0.02)
C8 (item misread) <0.001 (0.002) S7 (incorrect mapping – neutral probe) 0.004 (0.01)
C9 (incorrect mapping – proceeds) 0.006 (0.04)
C10 (incorrect mapping – clarification) 0.03 (0.05)
1

Frequency was calculated as the mean frequency that each code occurred per question per interview, averaged across all questions and interviews.

3.4. Influence of Interviewing Technique and Response Scale Format on ARS

When examining whether CI affected ARS differently than SI, the answer was clear: the CI approach yielded significantly lower ARS than the SI approach (β= −0.32 [SE=0.06], p<0.001; Table 4, Model 1). The mean ARS-HI score was 1.9 for the standardized interviews and 1.6 for the conversational interviews. In contrast, there was no significant difference in ARS-HI between responses to the DA versus non-AG response scales. The mean ARS-HI score was 1.8 for the non-AG scale and 1.8 for the DA scale (β= 0.04 [SE=0.04], p=0.22. The interaction between interviewing technique and response scale format was also nonsignificant (Model 2). In both models, ARS-HI was higher among Cuban Americans than Mexican Americans (p=0.003), lower among Puerto Ricans than Mexican Americans (p=0.05), higher among respondents aged 55 years or older (p<0.001), marginally higher among participants with a stronger Latino cultural orientation (p=0.06), and higher among respondents who completed the interview in Spanish (p=0.008). As compared to participants with less than a 7th-grade education, respondents with a 7th-12th grade education but no diploma had marginally lower ARS-HI (p=0.06), while participants with a high school diploma or higher degree had significantly lower ARS-HI (p<0.001). The intraclass correlation coefficient for both ARS-HI models was low (ICC=.02), which suggests that the results observed were not attributable to specific interviewers.

Table 4.

Hierarchical Linear Regression Estimates of the Influence of Interviewing Techniques (CI vs. SI) and Response Scale Formats (DA vs. Non-AG) on ARS-HI (n=891)

Model 1 Model 2
Estimate (SE) p-value Estimate (SE) p-value
Interviewing technique (ref=SI) −0.32 (0.06) *** <0.001 −0.31 (0.07) *** <0.001
Response scale format (ref=DA) 0.04 (0.04) 0.22 0.05 (0.04) 0.28
Latino heritage group (ref=Mexican American):
 Cuban American 0.16 (0.05) ** 0.003 0.16 (0.05) ** 0.003
 Puerto Rican −0.09 (0.05) * 0.05 −0.09 (0.05) * 0.05
Age quartiles (ref=Younger than 55):
 55–67 0.23 (0.05) *** <0.001 0.23 (0.05) *** <0.001
 67–77 0.18 (0.05) *** <0.001 0.18 (0.05) *** <0.001
 Older than 77 0.24 (0.06) *** <0.001 0.24 (0.06) *** <0.001
Education (ref=Less than 7th grade):
 7th through 12th grade, no diploma −0.10 (0.05) # 0.06 −0.10 (0.05) # 0.06
 High school graduate or equivalent −0.28 (0.05) *** <0.001 −0.28 (0.05) *** <0.001
 Some college or technical/vocational school −0.30 (0.07) *** <0.001 −0.30 (0.07) *** <0.001
 4-year college degree or higher −0.41 (0.06) *** <0.001 −0.41 (0.06) *** <0.001
Gender (ref=Male):
 Female 0.06 (0.05) 0.18 0.06 (0.05) 0.18
 Missing 0.26 (0.13) * 0.04 0.27 (0.13) * 0.04
Latino cultural orientation 0.10 (0.05) # 0.06 0.10 (0.05) # 0.06
Interview language (ref=English) 0.22 (0.08) ** 0.008 0.22 (0.08) ** 0.008
Interviewing technique x response scale format −0.004 (0.08) 0.96
Random component:
 Variance (intercept) 0.007 (0.08)** 0.005 0.007 (0.08)** 0.006
 Variance (residual) 0.29 (0.53) 0.29 (0.53)
Intraclass Correlation Coefficient (ICC) 0.02 0.02
#

p ≤ 0.10;

*

p ≤ 0.05;

**

p ≤ 0.01;

***

p ≤ 0.001

The pattern of results observed for ARS-HI was replicated in the ARS-VRS main effects model (Appendix C, Table S2). In contrast with the SI interviews, the CI interviews were associated with significantly lower values for the square-transformed ARS-VRS variable (β= −0.56 [SE=0.19], p=0.02). The mean scores for the original, untransformed ARS-VRS variable were 2.1 for SI interviews and 1.9 for CI interviews (β= −0.15 [SE=0.04], p<0.001). As with the ARS-HI variable, there was no significant difference in ARS-ARS between responses to the DA versus non-AG response scales in the square-transformed ARS-VRS main effects model. The mean, untransformed ARS-VRS score was 2.0 for the non-AG scale and 2.1 for the DA scale (β= −0.06 [SE=0.04], p=0.12). The interaction between interviewing technique and response scale format was also nonsignificant for the square-transformed ARS-VRS (Appendix C, Table S2). The intraclass correlation coefficient for both ARS-VRS models was low (ICC=.009).

Models testing interactions between interviewing technique and sociodemographic variables were generally nonsignificant; however, the interaction between age and interviewing technique was significant (p<0.001). As shown in Figure 1, a significant trend of increasing ARS-HI scores was observed as age increased in the SI condition (p<0.001), while no significant relationship between ARS-HI and age was observed in the CI condition (p=0.13). As shown in Figure 2, Latino cultural orientation and ARS-HI were positively associated in the SI condition [β=0.19 (SE=0.07), p=0.004; F(1, 594)=8.492, p=0.004], whereas there was no significant change in ARS-HI as Latino cultural orientation increased in the CI condition [β=0.10 (SE=0.08, p=0.23; F(1, 293)=1.476, p=0.23].

Figure 1.

Figure 1.

Analysis of Variance Comparing the Relationship between Age and ARS-HI by Interviewing Technique (n=891)

Figure 2.

Figure 2.

Linear Regression Estimates (Unadjusted) of the Influence of Latino Cultural Orientation on ARS-HI by Interviewing Technique (n=891)

3.5. Influence of Interviewer Behaviors on ARS

In order to develop a deeper understanding of how interviewing technique influenced ARS, we examined the relationships between the interviewer behavior codes and ARS-HI for the conversational and standardized interviews, respectively (Table 5). These results indicate that the conversational interviewers primarily reduced ARS-HI by providing definitions without explicit evidence of respondent confusion (C2; p=0.006) and helping with response mapping (C3/C4; p<0.001; see Appendix D for an example). Conversely, but consistent with these results, ARS-HI increased when the conversational interviewers failed to help with response mapping when there was evidence that such assistance was needed (C9; p=0.028). None of the standardized interviewer behaviors reduced ARS-HI; however, ARS-HI increased when the standardized interviewers issued neutral probes in response to evidence that response mapping assistance was needed (S6; p=0.018), i.e., providing unhelpful “help.”

Table 5.

Linear Regression Estimates of Associations between Interviewer Behavior Codes on ARS-HI by Interviewing Technique

CI (n=223) SI (n=393)
Estimate (SE) p-value Estimate (SE) p-value
C1 (definition after difficulty) −0.49 (0.35) 0.17
C2 (definition, no difficulty) −0.37 (0.14) ** 0.006
C3/C4 (non-definition assistance) −0.50 (0.12) *** <0.001
C5 (missing assistance) −0.14 (1.61) 0.93
C6 (neutral probe) 0.12 (0.17) 0.48
C7 (inaccurate definition) 1.20 (1.82) 0.51
C8 (item misread) −5.44 (13.2) 0.68
C9 (incorrect mapping – proceeds) 2.07 (0.94) * 0.03
C10 (incorrect mapping – clarification) 0.58 (0.75) 0.44
S1 (neutral probe) 0.09 (0.42) 0.83
S2 (missing neutral probe) 0.28 (0.54) 0.60
S3 (non-neutral assistance) −0.58 (1.31) 0.66
S4 (incomplete probe) −0.26 (0.16) 0.12
S5 (item misread) 0.48 (2.81) 0.86
S6 (incorrect mapping – proceeds) 3.56 (1.50) * 0.02
S7 (incorrect mapping – neutral probe) −2.26 (2.12) 0.29
Latino heritage group (ref=Mexican American):
 Cuban American 0.27 (0.10) ** 0.007 0.13 (0.08)# 0.10
 Puerto Rican 0.13 (0.09) 0.14 −0.19 (0.08) * 0.02
Age quartiles (ref=Younger than 55):
 55–67 0.11 (0.10) 0.26 0.31 (0.08) *** <0.001
 67–77 −0.04 (0.10) 0.64 0.35 (0.08) *** <0.001
 Older than 77 −0.03 (0.11) 0.79 0.37 (0.10) *** <0.001
Education (ref=Less than 7th grade):
 7th through 12th grade, no diploma −0.19 (0.09) * 0.04 −0.09 (0.08) 0.27
 High school graduate or equivalent −0.45 (0.10) *** <0.001 −0.34 (0.08) *** <0.001
 Some college or technical/vocational school −0.38 (0.12) ** 0.002 −0.29 (0.10) ** 0.005
 4-year college degree or higher −0.41 (0.11) *** <0.001 −0.51 (0.11) *** <0.001
Latino cultural orientation 0.07 (0.09) 0.43 0.18 (0.08) * 0.03
Interview language (ref=English) 0.03 (0.17) 0.87 0.15 (0.13) 0.24
R squared 0.28 0.25
Model p-value <0.001 <0.001
#

p ≤ 0.10;

*

p ≤ 0.05;

**

p ≤ 0.01;

***

p ≤ 0.001

3.6. Influence of Respondent Characteristics and Interviewer Behaviors on ARS

In both the CI and SI models (Table 5), respondents with higher education had lower mean ARS-HI (p<0.001). Age was positively associated with ARS-HI in the SI model (p<0.001 for all age categories) but unrelated to ARS-HI in the CI model. Additional analyses (Appendix C) indicated that the conversational interviewers helped older respondents (≥68 years) with response mapping (C4) about twice as often (0.15 times per question) as younger respondents [≤67 years; 0.08 times per question; t(199.76)= −2.1397, p=0.034]. The conversational interviewers also provided more non-definitional assistance after evidence of difficulty (C3) to respondents with less than a 4-year college degree (0.14–0.24 times per question) than to those with a college degree or higher (0.07 times per question; p=0.03), as well as more assistance with response mapping (C4) to respondents with the lowest (less than 7th grade; 0.18 times per question) or highest (4-year college degree or more; 0.14 times per question; p=0.04) education levels. ARS-HI was positively associated with Latino cultural orientation for SI (p=0.032) but not associated with ARS-HI for CI. Further analyses (not shown) indicated that there was no difference in how often the conversational interviewers helped respondents with response mapping (C4) by Latino cultural orientation. Cuban American respondents exhibited higher mean ARS-HI than Mexican Americans (p=0.007) in the conversational interviews, while Puerto Rican respondents had lower ARS-HI than Mexican Americans in the standardized interviews (p=0.015). Interview language was not associated with ARS-HI.

4. Discussion

In contrast to the literature on correcting response style-associated measurement error after data collection (e.g., Billiet and McClendon 2000; Moors 2010; Weijters et al. 2008), this study sought to identify methods that could be applied during data collection to reduce ARS among a highly acquiescent population of Latino survey respondents whose use of ARS is of particular concern for data quality. To this end, a 2×2 experimental design was used to test the efficacy of two proposed methods for reducing ARS among a sample of acquiescent Mexican American, Cuban American, and Puerto Rican telephone survey respondents.

The most impactful finding from this research is that, consistent with our hypothesis, the use of CI during questionnaire administration was associated with significantly less ARS than using SI. This effect was observed in a set of heterogeneous items using the same response scale, a separate set of items using different response scales, and across DA and non-AG response scales (there was no interaction between interviewing technique and response scale format). Thus, this finding appears to be robust. Data from this study also indicated that CI reduced the influence of age and cultural norms on ARS, as older age and a stronger Latino cultural orientation were each positively associated with ARS for the standardized interviews but unassociated with ARS for the conversational interviews.

Our findings from the standardized interviews are consistent with previous research in which ARS has been associated with older age (e.g., Ross and Mirowsky 1984; Weijters et al. 2010), possibly as a result of cognitive decline (Schneider 2017). Presumably, conversational interviewers were able to facilitate the response task for older respondents, making the task more comparable to how it was experienced by younger respondents for whom this kind of help may have been less frequently needed. Conversational interviewers provided more frequent non-definitional assistance to respondents with less education, as well as more assistance with response mapping to older respondents and those with the lowest and highest education levels. These findings are consistent with the premise that older and less educated respondents may encounter greater difficulty with question interpretation and response mapping, whereas highly educated respondents may experience challenges with response mapping, possibly due to overthinking their responses. The observation that a stronger orientation to Latino culture was associated with increased ARS is consistent with prior research indicating that cultural norms serve as drivers of ARS (Davis et al. 2019; Harzing 2006; Johnson et al. 2005; Marín et al. 1992). Together, these findings suggest that while cognitive decline, limited education, and cultural norms may have prompted higher ARS, interviewer assistance provided through the application of CI mitigated these difficulties and cultural tendencies.

Analyses of interviewer behaviors further revealed that the reduction of ARS in the CI condition appeared to be primarily attributable to conversational interviewers’ efforts to clarify terms and provide help with response mapping. When conversational interviewers clarified key terms in the question stem, this reduced ARS, even when respondents provided no evidence of difficulty. Providing clarification in the absence of respondent difficulty has been called a “preemptive strike” (Conrad and Schober 2000; Mittereder et al. 2018). In contrast, providing clarification in response to evidence of respondent difficulty had no impact on ARS in either the CI or SI condition. We can only speculate about the origins of this pattern, but one possibility is that preemptive strikes help to prevent or correct misunderstandings (i.e., interpretations that do not align with researchers’ intentions, but which respondents have no reason to doubt, so there is no evidence of respondent difficulty). In contrast, providing clarification when respondents provide overt evidence of struggling (e.g., disfluent or revised responses, such as investigated by Schober et al. 2012, among others) may help less because respondents’ struggles to make sense of the question consume mental resources needed to integrate new information from the interviewer into their interpretation of the question. Preemptive strikes may be particularly important for respondents with limited education, who have been observed to be more likely to engage in ARS during survey interviews (e.g., Meisenberg and Williams 2008; Weijters et al. 2010) and less likely to ask questions in other settings, such as doctor’s visits (Siminoff et al. 2006), potentially because of feeling embarrassed to reveal difficulty.

The second key CI behavior, helping with response mapping, more directly reduced ARS by clarifying how response scales corresponded to respondents’ internal representations of underlying constructs. Consistent with this finding, ARS increased when conversational interviewers failed to provide assistance with response mapping or standardized interviewers issued neutral probes despite indications that respondents were confused about the scales. These findings suggest that response mapping assistance was highly effective for mitigating ARS and that, when such assistance was not provided, respondents were more likely to acquiesce. These findings might explain why higher ARS has been observed in surveys conducted by more experienced interviewers (Olson and Bilgen 2011), who are assumedly more adherent to traditional SI practices (Schober et al. 2004) where response mapping assistance is not provided. Altogether, these findings suggest that when participants experienced difficulty with question interpretation and response scale mapping, these difficulties may have led them to feel unsure of their responses and default to acquiescence as a coping mechanism, potentially due to culturally associated communication norms (Davis et al. 2019).

The cost-benefit ratio of reducing ARS using CI methods is likely to vary across survey contexts. For example, since the values for the ARS variables used in this study only ranged from 0–3, the reduction in mean ARS scores from 1.9 for SI to 1.6 for CI that was observed in this study for the ARS-HI variable translates to a 10% reduction in ARS. Researchers may disagree on whether a 10% reduction in ARS is meaningful for particular surveys they plan to conduct. However, as this is the first study to compare the effects of CI and SI methods on ARS, much remains unclear about how CI methods affect ARS. For example, findings from this experiment suggest that even larger reductions in ARS-associated error may be obtained when using CI with specific populations (e.g., older Latino adults). CI may also have a greater impact on ARS when specific interviewer CI behaviors are emphasized, including those identified in this study (e.g., assistance with response mapping) and other interviewer techniques that are yet to be elucidated in future research.

Despite a growing literature documenting the beneficial effects of conversational interviewing on data quality, standardized interviewing has persisted as the predominant interviewing approach used in the U.S., likely due to both a lack of awareness of conversational interviewing methods and concerns that departures from standardized interviewing will harm data quality (e.g., Fowler and Mangione 1989). Such departures are well documented, but their harm to survey data quality is uncertain – in fact, in some cases they may improve response quality. Garbarski et al. (2016) demonstrate that while establishing rapport with respondents – which is usually assumed to promote the quality of survey responses – some interviewers engage in non-standardized behaviors. For example, when answering a survey question respondents commonly volunteer information that would make a subsequent question redundant and conversationally awkward, leading the interviewer to depart from strict standardization by verifying that the previously provided answer is correct: “Who is that first person, you said your husband?” Similarly, Schober et al. (2004) report that standardized interviewers sometimes ask respondents to clarify the reasons behind their answers, for example, “Okay, but there were two rooms designed specifically for bedrooms?” Some survey organizations that subscribe to the philosophy of standardized interviews permit interviewers to “tune” respondent’s answers in unscripted ways. For example, if the respondent answers a frequency question with “not too often,” the interviewer can follow up by offering a choice of options such as “So, would you say seldom, rarely, or never?” (van der Zouwen 2002). Similar to verifying a previously given answer, this, too, seems likely to be an exception to extreme standardization, which would call for repeating all response options in the interest of maintaining rapport and conversational normality. Numerous other examples of departures from strict standardization are reported in the survey methodology literature (e.g., Schaeffer et al. 2010).

Contrary to our hypothesis and previous findings by Höhne and Lenzner (2018),3 we found no evidence that unipolar non-AG response scales yielded less ARS than bipolar DA scales. While our primary ARS measure, ARS-HI, only included one response scale format, the items used to construct our secondary ARS measure, ARS-VRS, addressed this weakness by including response scales that were more closely tied to the constructs being assessed (e.g., using a scale ranging from “Not at all confident” to “Very confident” for “I am confident that I can eat vegetables when I am very hungry”). While it is possible that non-AG response scales have no advantage for reducing ARS, it is also possible that the non-AG response scales used in this study, like traditional DA scales, were still oriented on a negative-positive continuum, making it natural for respondents to conceptualize the response options as inherently positive or negative. For instance, “Do not believe at all” may have been interpreted as more negative and “Believe this very much” as more positive. Thus, if respondents were generally predisposed to provide positive-leaning responses, they may have endorsed positive response options on the non-AG response scales similarly to what we observed for the DA scale. Additional research using qualitative methods is needed to identify the conditions under which non-AG scales do and do not reduce ARS.

We are not aware of any previous studies that have conducted an experiment to mitigate ARS with an intentionally acquiescent sample. Our focus on acquiescent Latino respondents prevents us from generalizing to a national population in the U.S., and, as is common in public health research in the U.S. (George et al. 2014; Ryan et al. 2019; Torres et al. 2020; Valdez and Garcia 2021), men were underrepresented in the study sample. However, our experimental design in which respondents were randomly assigned to different conditions suggests that these effects could well generalize to other Latino and mixed-race/ethnicity samples, albeit likely diluted by the lower prevalence of ARS in the general population than in the current study (see Kish 1987 for a discussion of the tension between randomization and representation). Since ARS can inflate correlations, it was also impossible to disentangle ARS from true scores. But, even if the true scores were highly correlated, this would not explain why we observed differences in ARS by interviewing condition. Although the corpus of 616 interviews for coding was substantial, we may have observed different results if the complete set of interview recordings had been available for coding. Future research may strive to implement a coding process that distinguishes among different sub-behaviors within the key interviewer behaviors of providing definitions and assisting with response mapping that appeared to explain the overall differences in ARS by interviewing technique. Other interviewer and respondent behaviors that were not coded here may help to further illuminate why respondents engage in ARS and how this response tendency may be prevented. Future research can also explore how more specific interviewer behaviors impact ARS and whether CI methods are more effective in reducing ARS among respondents who engage in ARS more consistently than respondents with more intermittent tendencies to acquiesce.

Researchers have long observed that some survey respondents tend to engage in ARS more than others, resulting in increased risk of sociodemographically patterned error, especially when conducting surveys with acquiescent populations. Within the U.S., this risk may be particularly great in surveys with Latino populations, potentially leading to the distortion of survey statistics. Post-hoc corrections may appear to be an appealing solution. However, they require certain data set-ups (Liu et al. 2018), and, unless such correction methods were planned before data collection and relevant data are successfully collected, correction methods are difficult to implement. Moreover, in applied research settings, such as health and social services organizations, practitioners may lack the resources to conduct such analyses. At least for interviewer-administered survey questions that require respondents to place their opinions on a continuum such as a DA scale, using conversational interviewing to ask these questions may decrease respondents’ use of ARS during data collection, thereby reducing the need for post-survey adjustments. In this case, an ounce of prevention is almost certainly worth a pound of cure.

Supplementary Material

Appendix A
Appendix B
Appendix C
Appendix D

Acknowledgements

This research was supported by National Cancer Institute of the National Institutes of Health under award number R01 CA172283. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

1

The missing recordings seem to have been lost at random. The demographic characteristics of the 616 respondents for which recordings are available are indistinguishable from the characteristics of respondents whose recordings were lost (see Appendix C, Table S1).

2

In order to ensure consistency in the coding, the coders double-coded interview recordings during training until consensus was reached.

3

Of note, at least one study has reported findings similar to those reported here that involve extreme response style and item-specific scales, in which no difference in extreme response style was observed in responses to item-specific vs. DA scales (Liu et al. 2015).

Statements and Declarations

The authors have no relevant financial or non-financial interests to disclose.

References

  1. Aday LA, Chiu GY, Andersen R: Methodological issues in health care surveys of the Spanish heritage population. Am. J. Public Health 70, 367–374 (1980) [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alegria M, Takeuchi D: National Latino and Asian American Study. Retrieved from https://www.massgeneral.org/disparitiesresearch/Research/pastresearch/NLAAS-study.aspx (2002)
  3. Baumgartner H, Steenkamp J-BEM: Response styles in marketing research: A cross-national investigation. J. Mark. Res. 38, 143–156 (2001) [Google Scholar]
  4. Billiet JB, McClendon MJ: Modeling acquiescence in measurement models for two balanced sets of items. Struct. Equ. Model. 7, 608–628 (2000) [Google Scholar]
  5. Cohen S, Kamarck T, Mermelstein R: A global measure of perceived stress. J. Health Soc. Behav. 24, 385–396 (1983) [PubMed] [Google Scholar]
  6. Conrad FG, Schober MF: Clarifying question meaning in a household telephone survey. Public Opin. Q. 64, 1–28 (2000) [DOI] [PubMed] [Google Scholar]
  7. Conrad FG, Schober MF: Clarifying question meaning in standardized interviews can improve data quality even though wording may change: A review of the evidence. Int. J. Soc. Res. Methodol. 24, 203–226 (2021) [Google Scholar]
  8. Cuellar I, Arnold B, Maldonado R: Acculturation rating scale for Mexican Americans-II: A revision of the original ARSMA scale. Hisp. J. Behav. Sci. 17, 275–304 (1995) [Google Scholar]
  9. Davis RE, Johnson TP, Lee S, Werner C: Why do Latino survey respondents acquiesce? Respondent and interviewer characteristics as determinants of cultural patterns of acquiescence among Latino survey respondents. Cross-Cult. Res. 53, 87–115 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Davis RE, Resnicow K, Couper MP: Survey response styles, acculturation, and culture among a sample of Mexican American adults. J. Cross Cult. Psychol. 42, 1219–1236 (2011) [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. De Beuckelaer A, Weijters B, Rutten A: Using ad hoc measures for response styles: A cautionary note. Qual. Quant. 44, 761–775 (2010) [Google Scholar]
  12. Fowler FJ, Mangione TW: Standardized survey interviewing: Minimizing interviewer-related error (Vol. 18). Sage Publications, Inc., Thousand Oaks, CA: (1989) [Google Scholar]
  13. Garbarski D, Schaeffer NC, Dykema J: Interviewing practices, conversational practices, and rapport: Responsiveness and engagement in the standardized survey interview. Sociol. Methodol. 46, 1–38 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. George S, Duran N, Norris K A systematic review of barriers and facilitators to minority research participation among African Americans, Latinos, Asian Americans, and Pacific Islanders. Am. J. Public Health 104, e16–31 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Harzing A-W: Response styles in cross-national survey research: A 26-country study. Int. J. Cross Cult. Manag. 6, 243–265 (2006) [Google Scholar]
  16. Höhne JK, Lenzner T: New insights on the cognitive processing of agree/disagree and item-specific questions. J. Surv. Stat. Methodol. 6, 401–417 (2018) [Google Scholar]
  17. Hubbard FA, Conrad FG, Antoun C: The benefits of conversational interviews are independent of who asks the questions or the kinds of questions they ask. Surv. Res. Methods 14, 515–531 (2020) [Google Scholar]
  18. Johnson T, Kulesa P, Cho YI, Shavitt S: The relation between culture and response styles: Evidence from 19 countries. J. Cross Cult. Psychol. 36, 264–277 (2005) [Google Scholar]
  19. Kish L Statistical design for research. John Wiley & Sons, Inc., New York: (1987) [Google Scholar]
  20. Krosnick JA: Response strategies for coping with the cognitive demands of attitude measures in surveys. Appl. Cogn. Psychol. 5, 213–236 (1991) [Google Scholar]
  21. Lee S, Alvarado-Leiton F, Yu W, Davis R, Johnson TP: Developing a short screener for acquiescent respondents. Res. Social Adm. Pharm. 18, 2817–2829 (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Liu M, Suzer-Gurtekin ZT, Keusch F, Lee S: Response styles in cross-cultural surveys. In: Johnson TP, Pennell B-E, Stoop I,AL, Dorer B (eds.) Advances in Comparative Survey Methods: Multinational, Multiregional, and Multicultural Contexts (3MC), pp. 477–499. Wiley, Hoboken, N.J. (2018) [Google Scholar]
  23. Liu MN, Conrad FG, Lee S: Comparing acquiescent and extreme response styles in face-to-face and web surveys. Qual. Quant. 51, 941–958 (2017) [Google Scholar]
  24. Marín G, Gamba RJ, Marín BV: Extreme response style and acquiescence among Hispanics: The role of acculturation and education. J. Cross Cult. Psychol. 23, 498–509 (1992) [Google Scholar]
  25. Meisenberg G, Williams A: Are acquiescent and extreme response styles related to low intelligence and education? Pers. Individ. Differ. 44, 1539–1550 (2008) [Google Scholar]
  26. Mittereder F, Durow J, West BT, Kreuter F, Conrad FG: Interviewer–respondent interactions in conversational and standardized interviewing. Field Methods 30, 3–21 (2018) [Google Scholar]
  27. Moors G: Ranking the ratings: A latent-class regression model to control for overall agreement in opinion research. Int. J. Public Opin. Res. 22, 93–119 (2010) [Google Scholar]
  28. Nair V, Strecher V, Fagerlin A, Ubel P, Resnicow K, Murphy S, Murphy S, Little R, Chakraborty B, Zhang A: Screening experiments and the use of fractional factorial designs in behavioral intervention research. Am. J. Public Health 98, 1354–1359 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Narayan S, Krosnick JA: Education moderates some response effects in attitude measurement. Public Opin. Q. 60, 58–88 (1996) [Google Scholar]
  30. Olson K, Bilgen I: The role of interviewer experience on acquiescence. Public Opin. Q. 75, 99–114 (2011) [Google Scholar]
  31. Pasek J, Jang SM, Cobb CL III, Dennis JM, Disogra C: Can marketing data aid survey research? Examining accuracy and completeness in consumer-file data. Public Opin. Q. 78, 889–916 (2014) [Google Scholar]
  32. R Core Team.: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/ (2020) [Google Scholar]
  33. Rammstedt B, Danner D, Bosnjak M: Acquiescence response styles: A multilevel model explaining individual-level and country-level differences. Pers. Individ. Differ. 107, 190–194 (2017) [Google Scholar]
  34. Roberts C, Gilbert E, Allum N, Eisner L: Research synthesis: Satisficing in surveys: A systematic review of the literature. Public Opin. Q. 83, 598–626 (2019) [Google Scholar]
  35. Rosenberg M Society and the Adolescent Self-Image. Princeton University Press, Princeton, NJ: (1965) [Google Scholar]
  36. Ross CE, Mirowsky J: Socially-desirable response and acquiescence in a cross-cultural survey of mental health. J. Health Soc. Behav. 25, 189–197 (1984) [PubMed] [Google Scholar]
  37. Ryan J, Lopian L, Le B, Edney S, Van Kessel G, Plotnikoff R, . . . Maher C. It’s not raining men: A mixed-methods study investigating methods of improving male recruitment to health behaviour research. BMC Public Health 19, 814 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Salthouse TA: Constraints on theories of cognitive aging. Psychon. Bull. Rev. 3, 287–299 (1996) [DOI] [PubMed] [Google Scholar]
  39. Salthouse TA: Processing issues in cognitive aging. In: Schwarz N, Park D, Knauper B, Sudman S (Eds.), Cognition, Aging and Self-Reports, pp. 185–198. Psychology Press, Philadelphia, PA: (1999) [Google Scholar]
  40. Saris WE, Revilla M, Krosnick JA, Schaeffer EM: Comparing questions with agree, disagree response options to questions with item-specific response options. Surv. Res. Methods 4, 61–79 (2010) [Google Scholar]
  41. Schaeffer NC, Dykema J, Maynard DW: Interviewers and interviewing. In: Marsden, P.V., Wright JD (eds.) Handbook of Survey Research, 2nd ed., pp. 437–471. Emerald Group Publishing Limited, Bingley, UK: (2010) [Google Scholar]
  42. Schneider S: Extracting response style bias from measures of positive and negative affect in aging research. J. Gerontol. B. Psychol. Sci. Soc. Sci. 73, 64–74 (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Schober MF, Conrad FG: Does conversational interviewing reduce survey measurement error? Public Opin. Q. 61, 576–602 (1997) [Google Scholar]
  44. Schober MF, Conrad FG, Dijkstra W, Ongena YP: Disfluencies and gaze aversion in unreliable responses to survey questions. J. Off. Stat. 28, 555 (2012) [Google Scholar]
  45. Schober MF, Conrad FG, Fricker SS: Misunderstanding standardized language in research interviews. Appl. Cogn. Psychol. 18, 169–188 (2004) [Google Scholar]
  46. Siminoff LA, Graham GC, Gordon NH: Cancer communication patterns and the influence of patient characteristics: Disparities in information-giving and affective behaviors. Patient Educ. Counsel. 62, 355–360 (2006) [DOI] [PubMed] [Google Scholar]
  47. Torres VN, Williams EC, Ceballos RM, Donovan DM, Duran B, Ornelas IJ: Participant engagement in a community based participatory research study to reduce alcohol use among Latino immigrant men. Health Educ. Res. 35, 627–636 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Valdez LA, Garcia DO: Hispanic male recruitment into obesity-related research: Evaluating content messaging strategies, experimental findings, and practical implications. Int. Q. Community Health Educ. 42, 85–93 (2021) [DOI] [PubMed] [Google Scholar]
  49. Valliant R, Hubbard F, Lee S, Chang C: Efficient use of commercial lists in U.S. household sampling. J. Surv. Stat. Methodol. 2, 182–209 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. van der Zouwen J: Why study interaction in a survey interview? Response from a survey researcher. In: Maynard DW, Houtkoup-Steenstra H, Schaeffer NC, ven der Zouwen J (eds.), Standardization and Tacit Knowledge: Interaction and Practice in the Survey Interview, pp. 47–66. Wiley and Sons, New York, NY: (2002) [Google Scholar]
  51. Van Vaerenbergh Y, Thomas TD: Response styles in survey research: A literature review of antecedents, consequences, and remedies. Int. J. Public Opin. Res. 25, 195–217 (2013) [Google Scholar]
  52. Warnecke RB, Johnson TP, Chavez N, Sudman S, O’Rourke DP, Lacey L, Horm J: Improving question wording in surveys of culturally diverse populations. Ann. Epidemiol. 7, 334–342 (1997) [DOI] [PubMed] [Google Scholar]
  53. Weijters B, Geuens M, Schillewaert N: The stability of individual response styles. Psychol. Methods 15, 96–110 (2010) [DOI] [PubMed] [Google Scholar]
  54. Weijters B, Schillewaert N, Geuens M: Assessing response styles across modes of data collection. J. Acad. Mark. Sci. 36, 409–422 (2008) [Google Scholar]
  55. West BT, Conrad FG, Kreuter F, Mittereder F: Can conversational interviewing improve survey response quality without increasing interviewer effects? J. R. Stat. Soc. Series A, 181–203 (2018) [Google Scholar]
  56. West BT, Wagner J, Hubbard F, Gu H: The utility of alternative commercial data sources for survey operations and estimation: Evidence from the National Survey of Family Growth. J. Surv. Stat. Methodol. 3, 240–264 (2015) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix A
Appendix B
Appendix C
Appendix D

RESOURCES