Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2022 Jul 28;17(7):e0271949. doi: 10.1371/journal.pone.0271949

Information loss and bias in likert survey responses

J Christopher Westland 1,*
Editor: Carlos Andres Trujillo2
PMCID: PMC9333316  PMID: 35901102

Abstract

Likert response surveys are widely applied in marketing, public opinion polls, epidemiological and economic disciplines. Theoretically, Likert mapping from real-world beliefs could lose significant amounts of information, as they are discrete categorical metrics. Similarly, the subjective nature of Likert-scale data capture, through questionnaires, holds the potential to inject researcher biases into the statistical analysis. Arguments and counterexamples are provided to show how this loss and bias can potentially be substantial under extreme polarization or strong beliefs held by the surveyed population, and where the survey instruments are poorly controlled. These theoretical possibilities were tested using a large survey with 14 Likert-scaled questions presented to 125,387 respondents in 442 distinct behavioral-demographic groups. Despite the potential for bias and information loss, the empirical analysis found strong support for an assumption of minimal information loss under Normal beliefs in Likert scaled surveys. Evidence from this study found that the Normal assumption is a very good fit to the majority of actual responses, the only variance from Normal being slightly platykurtic (kurtosis ~ 2) which is likely due to censoring of beliefs after the lower and upper extremes of the Likert mapping. The discussion and conclusions argue that further revisions to survey protocols can assure that information loss and bias in Likert-scaled data are minimal.

Introduction

Likert mappings were named for Rensis Likert, who developed them in his PhD thesis, and promoted them for the remainder of his long career [13]. They are mappings from responses to simple statements that are claimed to capture unobservable beliefs in a discrete scale that expresses both the direction and strength of preferences (e.g., traits, habits, consumption patterns, political orientation, and so forth). Likert scaled survey questionnaires provide the evidence for a large portion of social, corporate and government policies that are guided by surveys. This research investigates information loss and bias in mappings from the unobservable belief distributions of a population into the Likert-scaled responses on a survey instrument, asking the research question:

  • Is the mapping of survey respondents’ preferences into a Likert metric ‘loss-less’ in the sense that: (1) that the sample contains all the information from the respondents actual preferences; and (2) that the mapping does not add information to the sample that is not in the respondents’ actual preferences?

Bias and informativeness of Likert metrics have been the center of recent questions about census and political polling accuracy, and indeed may be confounded with other survey biases—e.g., problems obtaining representative samples in phone surveys [4], improper weighting by education [5], poll objectives [6] and peoples’ resistance in answering poll questions [4]. Thanks partly to caller ID, average polling response in the US has fallen to only 6% in recent years, from more than half in the 1980s [7]. It is difficult to draw a sample of respondents who resemble their population. Increasing tribal, educational and non-reporting biases now affect marketing, census and social survey research.

Surveys are designed to elicit respondent preferences (e.g., for a car design), opinions (e.g., whether cars are harmful), behavior (e.g., whether brands affect purchasing), or facts (e.g., do you own a dog). Questionnaires are attractive as cheap and simple interrogative protocols. Within the past decade, survey questionnaires have been heavily automated, through companies such as the cloud-based software as a service SurveyMonkey. Automation has fostered a proliferation of questionnaires by investigators not particularly conversant in the methods and statistics of surveys. The automation and ‘democratization’ of surveys likely exacerbates failures in Likert mapping [8].

Metrics are basically distance functions, and in survey research, and points on a Likert-scales are assumed equidistant—i.e., distance between 1 and 2 is equidistant to 2 to 3; were it not, the survey would introduce ‘response bias.’ Survey responses may also be polytomous Rasch modeled as interval estimates on a continuum, and Likert points are levels of an attitude or trait—e.g. as might be used in consumer assessment, and scoring of performances by judges. A good Likert metric is symmetric about a middle category. Symmetric and equidistant Likert scales will behave like interval-level scales [9]. [10] showed that interval-level measurement was better achieved by a visual analogue scale [1113]. [14] found additional grammatical biases, where balanced Likert metrics become unbalanced in interpretation; for instance when ‘tend to agree’ is not directly opposite ‘tend to disagree.’ Other biases are: (1) central tendency bias; (2) acquiescence bias; and (3) social desirability bias. [15, 16] suggest that cultural, presentation and subject matter idiosyncrasies also introduce bias. [16] point out that Asian survey responses tend to suffer more from central tendency bias than Western responses.

This research builds and analyzes a model of Likert mapping of respondent beliefs. Section 2 builds an information loss model and applies this to hypothetical situations that reflect bias and resolution in a particular situation. Section 3 empirically investigates the results of the normative models in section 2 by analyzing a 129,880 observations of 14 questions for 442 demographic groups. Finally, section 4 suggests ‘best practices’ to ameliorate loss and bias in survey research.

Materials and methods

There exists a robust, empirically tested literature on preference ordering and expected utility in economics and psychology (see [17] for a summary). Surveys map unobservable human preferences, beliefs and opinions into sample items measured in some representative metric (e.g., Likert) which can be statistically summarized and analyzed. Researchers have established at least three axioms governing the topology of human preference orderings in beliefs:

  1. the axiom of transitivity; AB and BCAC.

  2. the axiom of completeness; ∀ A and B we have A ≿ B or B ≿ A or both, and

  3. the axiom of continuity; there are no ‘jumps’ in people’s preferences.

Belief distributions adhering to these axioms are continuous, convex, differentiable and monotonic over each survey respondent’s range (the support) over all possible beliefs and preferences. The survey instrument maps these into Likert metrics distributed in a polytomous Rasch model with k bins (where k is typically 5,7 or 9).

The geometry of Likert mapping reveals two inherent sources of bias and information loss: (1) binning which creates a discrete approximation of continuous beliefs, and (2) systematic censoring of extreme beliefs in the population.

Balanced, properly scaled Likert mappings will maintain the integrity of respondents beliefs, as can be shown in the following example. Let a particular survey instrument of n responses fill the ith of k Likert-scaled bins with probability qi. If Xi is an indicator variable for choice of the ith bin, then the possible Likert scaled outcomes are X1X2Xk-1=i=1k-1Xi, with Fisher information:

Ii=1k-1Xi=i=1k-1(nqi(1-qi))

A Likert mapping of Gaussian beliefs N(μ, σ2) which is perfectly balanced would result in qi such that the belief distribution is the limiting distribution, and Fisher information for n responses that is In=nσ2. The Gaussian approximation for the Bernoulli mappings to the Likert-scale values had variance σ2=q(1-q)n and the Fisher information lost or inserted during the Likert mapping would be:

i=1k-1(nqi(1-qi))-i=1k-1(nqi(1-qi))=0

Measuring information loss

Information content in information theory can be thought of as an alternative way of expressing probability as ‘entropy’ of a discrete random variable, introduced in [18]. In the current context of continuous beliefs mapped into discrete Likert scales, we need a measure of the difference between two probability distributions that is valid for both continuous and discrete distributions, which in this research is the Jeffreys divergence from a reference distribution. Fisher information, self-information, mutual information, Shannon entropy, conditional entropy and cross entropy can all be mathematically derived from Jeffreys divergence. Jeffreys divergence of a target dataset is the information required to reconstruct the target given the source (i.e., minimum size of a patch).

Fisher information measures the amount of information that an observable random variable carries about an unknown parameter of a distribution that models that random variable. Formally, it is the variance of the score, or the expected value of the observed information. It always exists because it is based on actual measurements. It can be derived as the Hessian of the relative entropy.

It is essential to differentiate between the theoretical and observed Fisher Information matrices. The negative Hessian evaluated at the MLE corresponds to the observed Fisher information matrix evaluated at the MLE. In contrast, the inverse of the (negative) Hessian is an estimator of the asymptotic theoretical covariance matrix, and the square roots of the diagonal elements are estimators of the standard errors. The theoretical Fisher information matrix is based on the Fisher information metric theorem which proves that KL-divergence is directly related to the Fisher information metric.

Formally, let l(θ) be a log-likelihood function and theoretical Fisher information matrix I(θ) be a symmetrical (p × p) matrix containing the entries I(θ) = —2θiθjl(θ) for 1 ≤ i, jp. The Hessian is defined as H(θ) = 2θiθjl(θ) for 1 ≤ i, jp and is the matrix of second derivatives of the likelihood function with respect to the parameters. It follows that if you minimize the negative log-likelihood, the returned Hessian is the equivalent of the observed Fisher information matrix whereas in the case that you maximize the log-likelihood, then the negative Hessian is the observed information matrix.

The observed Fisher information matrix is I(θ^ML) the information matrix evaluated at the maximum likelihood estimates (MLE). The second derivative of the log-likelihood evaluated at the maximum likelihood estimates (MLE) is the observed Fisher information [19]. The optimization algorithms used in this research return the Hessian evaluated at the MLE. When the negative log-likelihood is minimized, the negative Hessian is returned. The estimated standard errors of the MLE are the square roots of the diagonal elements of the inverse of the observed Fisher information matrix. That is, the square roots of the diagonal elements of the inverse of the Hessian (or the negative Hessian) are the estimated standard errors. The inverse of the Fisher information matrix is an estimator of the asymptotic covariance matrix Var(θ^ML)=[I(θ^ML)]-1 and the standard errors are then the square roots of the diagonal elements of the covariance matrix.

The main reason to be concerned with singularities in computing Fisher Information has to do with the asymptotics—a singularity implies that the usual (n)(θ^-θ)DN[0,I(θ^)-1] is not valid. Alternative formulations are provided in [20] and give the generalized asymptotic distributions, dependent on a parameter s and its parity (odd/even), where 2s+ 1 is the number of derivatives of the likelihood. [20] provides a unified theory for deriving the asymptotic distribution of the MLE and of the likelihood ratio test statistic when the information matrix has rank one less than full and the likelihood is differentiable up to a specific order. This is important since the likelihood ratio test uses the asymptotic distribution.

Kullback–Leibler divergence (KLD) is a ‘divergence’ metric. The use of the term ‘divergence’ as statistical distances are called, has varied significantly over time, with current usage established in [21]. [22] actually used ‘divergence’ to refer to the symmetrized divergence defined and used in [23], where [23] referred to this as ‘the mean information for discrimination … per observation’ while [24] referred to the asymmetric function as the ‘directed divergence.’

To assure that there is no confusion in the ordering of distributions (i.e. the benchmark standard distributions versus the empirical distributions) this research uses the symmetric Jeffreys divergence, which is a true distance metric (i.e., the distance from AB is the same as that from BA).

In practice it is not possible in advance of survey data collection to accurately balance Likert mappings; to do so you would need information about the outcome of the survey. When survey instruments are unbalanced, either by not centering responses or because they ignore extreme polarization, losses and biases can be significant as demonstrated in the two examples depicted in Fig 1, and developed in the following two examples. I have provided some parameter settings in these examples that were designed to illustrate potentially extreme situations that could occur in Likert-scaled data. While it is not my intention to explore a parameter space, as say one might do in optimization or machine learning, these do represent situations that we might expect to occur in survey practice. Further below, I analyze a large scale survey in the Airline industry to provide a real world benchmark for parameters that explores situations most likely to be encountered in practice.

Fig 1. Likert maps: Survey assumptions (blue) Actual beliefs (red) Likert mapping (grey).

Fig 1

Example: Unbalanced mappings with Gaussian beliefs

A Gaussian distribution N(μ, σ) has Fisher information matrix:

I(μ,σ)=(1/σ2002/σ2)

Maximum likelihood estimates of μ and σ are sufficient, thus cross-correlation terms are zero and only variance (through the Cramér–Rao bound) contributes to Fisher information. Where a Likert mapping is mis-scaled so that it fails to capture extreme preference responses (e.g., see the left hand graph in Fig 1) it censors data that we know exists outside its range. Censoring (in contrast to truncation) remaps beliefs outside the Likert support into the rightmost or leftmost extreme value of the Likert scale. Table 1 summarizes Fisher information in Gaussian beliefs censored by a Likert mapping.

Table 1. Fisher information censored by remapping N(μ = 0, σ = 1) beliefs.
Standard Gaussian Distribution Censored Standard Gaussian Distribution
μ × μ 9757 9990
μ × σ 0 0
σ × σ 19515 19979
σ × μ 0 0

Table 1 shows that for balanced Likert mappings, information loss and bias are insignificant. Table 2 summarizes an unbalanced mapping example, the Likert scale is built assuming respondents have N(μstandard, σstandard) = N(0, 1) while the actual responses are significantly biased with distribution N(μactual, σactual) = N(3, 1).

Table 2. Fisher information for N(μ = 3, σ = 1) beliefs with an ‘unbalanced’ Likert mapping.
Actual Beliefs Censoring Binning % Censoring Loss % Binning Loss
μ × μ 10187 14762 8444 -45% 43%
μ × σ 0 0 0 0% 0%
σ × σ 20374 29524 7418 -45% 75%
σ × μ 0 0 0 0% 0%

Note that censoring adds false information; the ‘gains’ (negative percentages) in information result from artificial inflation of extreme right-hand value + 3 in the Likert mapping.

Example: Beta beliefs that express polarization

Polarized beliefs can be simulated using a Beta(α, β) distribution, where α, β < 1 (e.g., as depicted in the right-hand graph in Fig 1) supported on a −3, 3 support on a 7-point Likert scale. The p.d.f. of a Beta with [upper, lower] bounds of support is

B(x|α,β)=Γ(α+β)Γ(α)Γ(β)(x+upper(upper-lower+1))α-1(1-x+upper(upper-lower+1))β-1

Fisher information in a Beta-distributed random variable X is:

I(α,β)=[var[lnX]cov[lnX,ln(1-X)]cov[lnX,ln(1-X)]var[ln(1-X)]]

Table 3 summarizes Fisher information where the Likert scaling assumes N(μstandard, σstandard) = N(0, 1) while actual responses are distributed B(αactual, βactual) = B(0.5, 0.5).

Table 3. Fisher information for Beta(α = 0.5, β = 0.5) beliefs mapped to a Likert-scale.
Actual Beliefs Censoring Binning % Censoring Loss % Binning Loss
α × α 172 144 228 16% -58%
α × β 87 73 116 16% -58%
β × β 148 124 196 16% -58%
β × α 87 73 116 16% -58%

For XBeta(α, β) then Var[lnX] = Var[ln(1 − X)] = Cov[lnX, ln(1 − X)] = ψ(α) − ψ(α + β) where ψ is the trigamma function. This explains the symmetric scaling of the percentage errors, and once again only variance contributes to Fisher information. Because the extreme mappings are more or less balanced, the impact of censoring is less important in the Beta example than changes in resolution due to binning, which injects artificial information into the survey statistics.

Results

The prior analysis postulated example situations which may or may not be commonplace, that suggest Likert responses could in practice lead to erroneous conclusions. The current section evaluates empirical results from a professionally conducted airline customer satisfaction survey to provide insight into whether the preceding problems are common in practice. The dataset used in this study was extracted from a professional 2015 survey of customer satisfaction by a major US airline, who released the data on the condition of remaining anonymous. The curated database is available in three locations on Kaggle, at https://www.kaggle.com/sjleshrac/airlines-customer-satisfaction, www.kaggle.com/johndddddd/customer-satisfaction and at www.kaggle.com/teejmahal20/airline-passenger-satisfaction and was downloaded on June 6, 2021 from the first source on this list. The dataset has been the subject of a number of machine learning and sentiment analysis studies documented on the Kaggle site.

I have chosen a consumer sentiment dataset that is extensive, and has been well researched over the past six years. One of the most common commercial applications of Likert-scaled surveys is the assessment of consumer sentiment, so it was my impression that such a dataset would be most germane to the particular research problems I was addressing. A large scale survey such as this, prepared by a professional consulting firm, will be better controlled and curated than would be typical for the average social science Likert-scaled dataset. The acquisition of this dataset was performed under appropriate controls, and similar studies would be replicable. The sample sizes were very large by academic standards, and encompassed a rich set of demographic-behavioral factors and customer satisfaction factors.

This is a US airline, and results are all from US passengers. The survey is huge by academic research database standards. The respondents were drawn from 442 different behavioral-demographic groups summarized by existing permutations of the factor levels in Table 4. The results are extremely robust, as shown in the reported statistics. The factors and levels in Table 4 were those for which the airline had internal seat occupancy and upgrade algorithms, largely focused on determining seat prices. The airline actively manages these particular factors and factor levels and considers them to be the key success factors controlled by the airline’s profitability. The survey was conducted by a consulting firm versed in survey research, and was controlled at a professional and high technical standard. The survey technical standards exceed that for most academic research surveys. The R code used to create this research paper is uploaded to Kaggle. The survey is huge by research database sizes, consisting of 129,880 independent responses to 14 Likert-scaled question responses (Table 5) from each customer concerning satisfaction with a particular factor managed at the airline. The respondents were drawn from 442 different behavioral-demographic groups summarized by existing permutations of the factor levels in Table 4. The results are extremely robust, as shown in the reported statistics. The specific factors that were Likert-scale surveyed were those for which the airline had internal programs and staff dedicated to assuring that customers were well served on their airline. They are paying money to manage these factors, and consider them to be the key success factors controlled by the airline’s management

Table 4. Demographic groups in this research defined by factors and levels.

Factor Levels Level 1 Level 2 Level 3
satisfaction 2 Dissatisfied Satisfied ~
gender 2 Female Male ~
customer.type 2 Disloyal Loyal ~
travel.type 2 Business Personal ~
travel.class 3 Business Economy Economy+
age 3 <21 21-60 >60
distance 3 <100 100-1000 >1000
departure.delay 3 ontime <1hr >1hr
arrival.delay 3 ontime <1hr >1hr

Table 5. Summary statistics for raw data.

Factor Mean Std.Dev Skewness Kurtosis
Seat.comfort 2.839 1.393 -0.09186 2.057
Departure.Arrival.time.convenient 2.991 1.527 -0.25228 1.911
Food.and.drink 2.852 1.444 -0.11681 2.013
Gate.location 2.990 1.306 -0.05306 1.910
Inflight.wifi.service 3.249 1.319 -0.19112 1.879
Inflight.entertainment 3.383 1.346 -0.60482 2.467
Online.support 3.520 1.307 -0.57536 2.189
Ease.of.Online.booking 3.472 1.306 -0.49171 2.089
On.board.service 3.465 1.271 -0.50526 2.215
Leg.room.service 3.486 1.292 -0.49643 2.159
Baggage.handling 3.696 1.156 -0.74303 2.762
Checkin.service 3.341 1.261 -0.39244 2.206
Cleanliness 3.706 1.152 -0.75599 2.791
Online.boarding 3.353 1.299 -0.36649 2.062

There are 24×35 = 3888 unique combinations of factor values (Table 4) implying 3888 potential behavioral-demographic subgroups of passengers who would be expected to respond to the questionnaire each in their own unique way. Only 1375 different subgroups existed in the actual dataset, and only 442 of these subgroups had more than 20 observations, which was considered the minimum acceptable for fitting to a Normal distribution or for Central Limit Theorem convergence. These 442 subgroups with different response behaviors and biases were studied in this research. This reduced the total number of responses in the survey from 129,880 to 125,387 (i.e., reduced by 4493 responses) and 442 distinct behavioral-demographic groups.

An exploratory comparison of the empirical distribution of Likert-scaled data from the Airline Customer Satisfaction dataset used below, each to a Normal(3, 1), Poisson(3) and Beta(.5, .5) random variables, showed only insignificant difference between Jeffreys divergence and KLD for this dataset.

Table 5 summarizes the first four moments of Likert responses for the entire 129,880 responses across the dataset. Note that the assumptions made in the normative analysis of Likert-scaled metric information content are generally adhered to—mean of 3 (center of 5-point Likert scale), standard deviation of ~1, skewness ~0 (centered responses), with the responses being slightly platykurtic due to the truncation of beliefs above Likert-5 and below Likert-1.

Figs 2 through 5 show that our standard assumption of 5-point Likert responses accurately reflecting a Normal(μ = 3, sd = 1) with 1-point on the Likert scale being equal to the σ standard deviation of the Normal assumption. In addition, Skewness is zero, and kurtosis is consistently platykurtic.

Fig 2. Means of likert responses by demographic.

Fig 2

Fig 5. Kurtosis of likert responses by demographic.

Fig 5

Fig 3. Standard deviation of likert responses by demographic.

Fig 3

Fig 4. Skewness of likert responses by demographic.

Fig 4

Empirical results: The information penalty of an incorrect belief assumption

The Fisher information metric theorem proves that KLD (and thus Jeffreys divergence) is directly related to the Fisher information metric, with the Fisher Information Matrix being the Hessian of the KLD between two distributions evaluated at the MLE. I use Jeffreys divergence in the empirical analysis for this paper to measure the information penalty (distance) between actual Likert responses, and an assumed distribution of respondents’ beliefs. Jeffreys divergence provides a measurement of how far the distribution Q is from the distribution P. Jeffreys divergence is used in areas such as clutter homogeneity analysis in radar processing, and KLD is used in ruin theory in insurance and in computing Bayesian information gain in moving from a prior distribution to a posterior distribution [22, 25].

Jeffreys divergence is employed here for succinctness and clarity—whereas Fisher information is a matrix whose dimension depends on the number of parameters of the distribution, Jeffreys divergence is a single distance metric. Jeffreys divergence reports the amount of information lost because of particular modeling assumptions—in the current case, the assumptions about the underlying respondent beliefs that are being mapped into a Likert-scaled metric.

Figs 6 through 8 summarize my analysis, revisiting the questions of information loss which I analyzed in normative models in the first part of this paper. Figs 6 through 8 present density graphs of the Jeffreys divergence ‘information penalty’ across all of the 442 behavioral-demographic groups, for each of the 14 customer satisfaction factors, under 3 belief assumptions—Normal, Poisson and Beta distributed beliefs.

Fig 6. Information penalty (Jeffreys Divergence) assuming normal beliefs.

Fig 6

Fig 8. Information penalty (Jeffreys Divergence) assuming beta beliefs.

Fig 8

Fig 7. Information penalty (Jeffreys Divergence) assuming poisson beliefs.

Fig 7

Across the 14 factors for which we have Likert-scaled responses, the assumption of Normal belief suffers the least information loss, as summarized in Table 6. This is consistent with insights gained from the review of distributional statistics of the full dataset responses in Figs 2 through 5, which support the validity of prior assumptions of Normal beliefs in well-controlled Likert-scaled survey responses.

Table 6. Statistics of Jeffreys divergence information penalty for all demographic groups.

Distribution Mean Std.Dev Skewness Kurtosis
Normal 0.0835 0.0365 3.749 67.11
Poisson 0.1047 0.0535 2.217 24.97
Beta 0.2536 0.1208 0.764 5.07

Discussion and conclusions

Polarized survey responses are common in research on political and social issues in North America, and thus the problem addressed in this research is an important one. Survey bias towards boundary-inflated responses among polled Americans, and midpoint-inflated responses among Asians have been repeatedly documented and called out as a challenge to survey based research (e.g., see [16, 26, 27]) with the Pew Research Center [28] describing such bias as a major challenge to democracy and a consistent problem in their surveys.

Cross-sensory response research in [29], specifically studies in the human taste response to music, has pioneered Bayesian alternatives to frequentist analysis of Likert-scaled data. In [29] a sample of 1611 participants tasted one sample of chocolate while listening to a song that evoked a specific combination of cross-modal and emotional consequences. The researchers addressed difficulties in interpreting frequentist statistical tests using discrete, categorical responses by applying a Bayesian model to quantify the information content of a response. The approach used in [29] is well suited to sentiment analysis problems that have long been analyzed using structural equation models and frequentest Neyman-Pearson hypothesis tests [30, 31].

Data collected for [29] study showed strong non-symmetric behavior among the bounded scales, with large numbers of respondents selecting extreme values close to the boundaries, which contradicted the assumptions of traditional multivariate regression approaches to analysis, because residuals could not be Gaussian distributed were responses at the boundaries of the response space. These sorts of polarized responses are quite common in research on political and social issues in North America, and thus the problem addressed is an important one. Often they are modeled as zero-inflated Gaussian or Poisson distributions, though too often, no accommodations are made at all for the analysis for the zero-inflated data.

In order to overcome this problem [29] remapped each outcome j for each individual i into a unit (0, 1) range. They then used Bayesian, multi-response, multivariate, logit-normal distribution with outcome-specific intercepts and slopes, and common covariance structure across outcome measures, following the methodology in [32]. The logit-normal distribution can take a variety of shapes, e.g., U-shapes and J-shapes. More importantly, they are designed to specifically address the zero-inflated data distributions that arise in particularly polarized survey responses.

A Bayesian multi-response version of the multivariate logit-normal regression model was used in [29]. Outcome-specific intercepts and slopes were needed since the association of each co-variate with each of the responses could significantly differ. They also take advantage of the inherent high-correlation of responses due to individual consistency in responses, and to social and cultural clustering of beliefs (and responses) in survey data, through joint modeling of all the outcomes, allowing the borrowing of information between responses. The Bayesian multi-response version of the multivariate logit-normal regression model presented in [29] provides a flexible, scalable and adaptive model where reliance on the central limit theorem can be questionable. Additionally, where available, they provide a natural form to incorporate any prior information either from prior studies or from expert opinion. The transformations specified in [29] result in a model error term (representing features not captured by the data) which is multi-Normal, allowing for calculation with available statistical software.

Suggestions for ‘Best Practice’ in survey use of likert metrics

This research suggested several examples to show that in theory, Likert mappings could be lossy and biased under a specific set of circumstances and using a balanced, centered and specific design for each of the individual questions on the survey instrument. Furthermore, I provided examples of situations in which:

  1. the variance of the sample standard deviation from a single Likert-scale point will result in an increasing information loss;

  2. the location of the belief distribution mean fails to coincides with the central Likert bin, and the survey instrument is ‘unbalanced’;

  3. the Likert mapping depends on the mean of the belief distribution and is sensitive to the Likert metric being ‘balanced’;

  4. where the respondents’ opinions are extremely polarized with respondents choosing to extremely agree or extremely disagree with the assertion, or in contrast where respondents demur and choose the center of the Likert scale;

  5. where where the respondents’ opinions reflect universally held strong beliefs, either negative or positive, and;

  6. where censoring and binning both add information that was not in the original data.

Though the examples suggest many ways in which Likert metrics can lead to incorrect conclusions, my empirical results suggest that these problems do not assert themselves in most typical survey studies, with commonly encountered respondent behavioral-demographic profiles.

The study analyzed involved airline passengers with a broad range of demographics and behaviors. The differences in responses was small across all of the demographic groups, as reflected in the Jeffreys divergence information penalty for any particular a priori assumption about respondent beliefs. An assumption of Normal prior beliefs, with the mean of the belief distribution corresponding to the midpoint in the Likert scale, produced minimal information losses. The empirical analysis strongly supports current protocols and assumptions in the conduct of Likert scaled survey research.

This is reassuring, but nonetheless, it is difficult to know enough about population beliefs to design a priori a perfectly balanced, centered and specific survey instrument without having already completed at least a limited survey and data analysis. In practice, there exist survey protocols in which evidence-based interactive design of surveys has successfully resolved or ameliorated problems in instrument design, though these are typically ignored in most SurveyMonkey style implementations. Interactive, multi-step methods can involve: (1) pretesting, (2) invoking optimal stopping based on an objective function, or (3) implementing redundancy in sampling, along with general sentiment assessment to scale and center responses during the survey execution.

Finally, there is much to learn from reviewing survey strategies and analysis of responses that have been successfully employed in two research areas—polygraph protocols and clinical trials in medicine. These can potentially provide ‘best practice’ guidance for Likert-scaled survey research design. The most rigorous survey protocols appear in polygraph testing, partly because these protocols have access to enormous amounts of emotional responses (unavailable in SurveyMonkey type surveys) in addition to a subject’s verbal responses. Polygraph protocols are obsessive about balancing questions, centering the response scale, and assuring that interrogation spans the gamut of the belief distribution support [33]. This happens interactively during the interview process, and polygraph interrogators are constantly adjusting and rechecking responses to questions as the interview progresses. The same question will be asked repeatedly to assure that respondents’ answers are consistent and honest. In addition, polygraph interrogators ask a variety of questions besides the primary relevant statements that provide information supporting the research objective; these are used in ‘fine tuning’ the survey instrument. They include: (1) irrelevant statements designed to identify subjects ‘gaming’ the survey, truthfulness statements describing behavior that the majority of subjects have been involved in, to detect habitual liars, and sacrificial statements designed to absorb the initial response to a relevant issue and to set the context so that subsequent statements elicit consistent responses [33].

Polygraph protocols yield smaller type I and type II errors than questionnaires that lack the controls provided by sacrificial, irrelevant and truthfulness statements to benchmark the survey subject’s mood, cooperativeness and seriousness about the survey [34]. In criminal law, ‘Blackstone’s ratio’ suggests that ‘It is better that ten guilty persons escape than that one innocent suffer’ [35] inherently biasing judgments towards minimizing false negatives. This is why polygraphs are inadmissible in court, even though they outperform other survey protocols.

Clinical trials in medicine and pharmaceuticals have employed stopping rules, particularly in Phase I clinical trials [36]. The simplest stopping rules target a given precision for minimum cost of testing. Researchers will also review whether successive samples provide evidence that the parameter of interest is changing, by: (1) examining patterns of observed responses, and (2) using missing data methods to impute missing responses [37]. Optimal stopping models are widely used in machine maintenance, economics, and finance. As in clinical trials, Likert scaled survey instruments could be optimally designed to collect initial responses, where these would be tracked and analyzed, then revised to create balanced, centered and specific questions for the next round of sampling; applied again, responses collected and analyzed, revised again; and so forth until loss and bias are within a critical range.

Data Availability

Data was downloaded on June 6, 2021 from https://www.kaggle.com/sjleshrac/airlines-customer-satisfaction.

Funding Statement

The authors received no specific funding for this work.

References

  • 1. Likert R. A technique for the measurement of attitudes. Archives of psychology. 1932. [Google Scholar]
  • 2.Likert R. New patterns of management. 1961.
  • 3.Murphy G, LikertR. Public opinion and the individual. Harper; 1938.
  • 4.Silver N. The polls weren’t great. https://fivethirtyeightcom/features/the-polls-werent-great-but-thats-pretty-normal/. 2020.
  • 5.Cohn N. What the polls got wrong. https://wwwnewyorkercom/news/q-and-a/nate-cohn-explains-what-the-polls-got-wrong. 2020.
  • 6.Cohn N. Polls what went wrong. https://wwwnytimescom/2020/11/10/upshot/polls-what-went-wronghtml. 2020.
  • 7.Leonhart D. Polls what went wrong. https://wwwnytimescom/2020/11/12/us/politics/election-polls-trump-bidenhtml. 2020.
  • 8. Kiesler S, Sproull LS. Response effects in the electronic survey. Public Opinion Quarterly. 1986;50: 402–413. doi: 10.1086/268992 [DOI] [Google Scholar]
  • 9. Burns A, Bush R. Marketing research: Online research applications. person. prentice Hall; 2005. [Google Scholar]
  • 10. Reips U-D, Funke F. Interval-level measurement with visual analogue scales in internet-based research: VAS generator. Behavior Research Methods. 2008;40: 699–704. doi: 10.3758/BRM.40.3.699 [DOI] [PubMed] [Google Scholar]
  • 11. Friedman J, Hastie T, Rosset S, Tibshirani R, Zhu J. Discussion of boosting papers. Ann Statist. 2004;32: 102–107. [Google Scholar]
  • 12. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software. 2010;33: 1. doi: 10.18637/jss.v033.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Sterne JA, Smith GD. Sifting the evidence—what’s wrong with significance tests? Physical Therapy. 2001;81: 1464–1469. doi: 10.1093/ptj/81.8.1464 [DOI] [PubMed] [Google Scholar]
  • 14. Worcester RM, Burns TR. Statistical examination of relative precision of verbal scales. Journal of the Market Research Society. 1975;17: 181–197. [Google Scholar]
  • 15.Dietz L, Bickel S, Scheffer T. Unsupervised prediction of citation influences. Proceedings of the 24th international conference on machine learning. ACM; 2007. pp. 233–240.
  • 16. Lee JW, Jones PS, Mineyama Y, Zhang XE. Cultural differences in responses to a likert scale. Research in nursing & health. 2002;25: 295–306. doi: 10.1002/nur.10041 [DOI] [PubMed] [Google Scholar]
  • 17.Board S. Preferences and utility. UCLA, Oct. 2009.
  • 18. Shannon CE. A mathematical theory of communication. The Bell system technical journal. 1948;27: 379–423. doi: 10.1002/j.1538-7305.1948.tb00917.x [DOI] [Google Scholar]
  • 19. Pawitan Y. In all likelihood: Statistical modelling and inference using likelihood. Oxford University Press; 2001. [Google Scholar]
  • 20. Rotnitzky A, Cox DR, Bottai M, Robins J. Likelihood-based inference with singular information matrix. Bernoulli. 2000; 243–284. doi: 10.2307/3318576 [DOI] [Google Scholar]
  • 21. Amari S, Nagaoka H. Methods of information geometry. American Mathematical Soc.; 2000. [Google Scholar]
  • 22. Kullback S, Leibler RA. On information and sufficiency. The annals of mathematical statistics. 1951;22: 79–86. doi: 10.1214/aoms/1177729694 [DOI] [Google Scholar]
  • 23. Jeffreys H, Jeffreys B. Methods of mathematical physics, cambridge, 192. 1946. [Google Scholar]
  • 24.Lindley D. Taylor & Francis; 1959.
  • 25. Berger JO, Bernardo JM, Sun D. The formal definition of reference priors. The Annals of Statistics. 2009;37: 905–938. doi: 10.1214/07-AOS587 [DOI] [Google Scholar]
  • 26.Grandy J. Differences in the survey responses of asian american and white science and engineering students. ETS Research Report Series. 1996;1996: i–23.
  • 27. Wang R, Hempton B, Dugan JP, Komives SR. Cultural differences: Why do asians avoid extreme responses? Survey Practice. 2008;1: 2913. [Google Scholar]
  • 28.Gao G. The challenges of polling asian americans. 2016.
  • 29.Reinoso-Carvalho et al. Blending emotions and cross-modality in sonic seasoning: Towards greater applicability in the design of multisensory food experiences. Foods. 2020;9: 1876. doi: 10.3390/foods9121876 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Westland JC. Structural equation models. 2nd edition. Springer; 2019. [Google Scholar]
  • 31. Sarstedt M, Ringle CM. Structural equation models: From paths to networks (westland 2019). Springer; 2020. [Google Scholar]
  • 32. Lesaffre E, Rizopoulos D, Tsonaka R. The logistic transform for bounded outcome scores. Biostatistics. 2007;8: 72–85. doi: 10.1093/biostatistics/kxj034 [DOI] [PubMed] [Google Scholar]
  • 33. Westland JC. Affective data acquisition technologies in survey research. Information Technology and Management. 2011;12: 387–408. doi: 10.1007/s10799-011-0110-9 [DOI] [Google Scholar]
  • 34. Westland JC. Electrodermal response in gaming. Journal of Computer Networks and Communications. 2011;2011. doi: 10.1155/2011/610645 [DOI] [Google Scholar]
  • 35. Blackstone W. Commentaries on the laws of england in four books. Vol. 1. Philadelphia: JB Lippincott Co; 1893. [Google Scholar]
  • 36. Rao RS, Glickman ME, Glynn RJ. Stopping rules for surveys with multiple waves of nonrespondent follow-up. Statistics in Medicine. 2008;27: 2196–2213. doi: 10.1002/sim.3063 [DOI] [PubMed] [Google Scholar]
  • 37.Wagner T. The global achievement gap: Why even our best schools don’t teach the new survival skills our children need-and what we can do about it. ReadHowYouWant. com; 2010.

Decision Letter 0

Carlos Andres Trujillo

23 Feb 2022

PONE-D-21-19317Information Loss and Bias in Likert Survey ResponsesPLOS ONE

Dear Dr. Westland,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

I want to apologize for the unusually long time that this review took to be completed. There were delays with the second reviewer and I wanted to make sure that your paper was read by true experts. I hope you will find their comments useful. Overall, we all agree that this is a very informative paper. In addition to answer the reviewers suggestions, I want you to make an extra effort to make the implications of your results and your recommendations very clear to a wider audience. We look forward to reading the revised version. 

Please submit your revised manuscript by Apr 09 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Carlos Andres Trujillo, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

3. Thank you for stating the following financial disclosure: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

At this time, please address the following queries:

a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution. 

b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

c) If any authors received a salary from any of your funders, please state which authors and which funders.

d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Please amend your list of authors on the manuscript to ensure that each author is linked to an affiliation. Authors’ affiliations should reflect the institution where the work was done (if authors moved subsequently, you can also list the new affiliation stating “current affiliation:….” as necessary).

5. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The author showed an interesting problem together with rigurous mathematical arguments. The manuscript also has an interesting application. I would appreciate if you could read my comments in the attached pdf.

Reviewer #2: To measure the closeness between the two probability distributions, the Kullback Leibler divergence

The paper focuses on minimizing the bias and informativeness of Likert metrics distribution using the Kullback-Leibler divergence, KLD, from the actual responses empirical distribution and an assumed distribution of respondent beliefs through the information penalty.

KLD is a divergence measure, not a distance metric. Thus, the order of the two distributions matters. KLD(Q, P) is not equal to KLD(P, Q). The use of divergence and distance is confusing. Moreover, in the setup, it is unclear how the order of the distributions is.

The assumed distributions used in the paper are standard for the beliefs. However, in real applications, the implementation of much more complex distributions is appropriate and natural. For example, a mixture of distributions, zero-inflated distributions, among others.

Under the complexity issue, the Hessian matrix is singular, which leads to the impossibility of measuring the information penalty. Therefore, in this situation, the solution requires the application of different methods.

It is unclear how is designed the strategy to set optimally the hyper-parameters of the ideal belief distribution.

The empirical application is outstanding and informative, from which the minimal information loss found is under Normal beliefs in Likert scaled surveys.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: hector zarate

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: review.pdf

PLoS One. 2022 Jul 28;17(7):e0271949. doi: 10.1371/journal.pone.0271949.r003

Author response to Decision Letter 0


14 May 2022

Author Response to Reviewer Comments on PONE-D-21-19317 Information Loss and Bias in Likert Survey Responses

J. Christopher Westland

2022-04-23

Reviewer’s Assessment of the Suitibility of Research

Comment Reviewer #1 Response Reviewer #2 Response

1. Is the manuscript technically sound, and do the data support the conclusions? Yes Partly (I have responded below)

2. Has the statistical analysis been performed appropriately and rigorously? Yes Yes

3. Have the authors made all data underlying the findings in their manuscript fully available? Yes Yes

4. Is the manuscript presented in an intelligible fashion and written in standard English? Yes Yes

Reviewer Comments to the Author

Reviewer #1:

The author showed an interesting problem together with rigorous mathematical arguments.

The manuscript also has an interesting application.

Thank you. I appreciate the compliment and hopefully have addressed remaining questions concerning the methods, results and writing.

Main points

The paper sets forth the task of examining distributional implications of Lik-

ert response surveys. The latter are widely used accross many social disciplines

such as marketing, opinion polls, economic disciplines and health surveys among

many others. Being discrete categorical metrics, they can not only lose signifi-

cant information from the real-world mappings beliefs, but also generate signif-

icant biases into the statistical analysis. The manuscript sets forth the task of

showing through probabilistic arguments when do Likert scales do what they are

supposed to do and when things can go wrong (strong polarization and beliefs).

Such problems practically disappear when beliefs are Normal beliefs in Likert

scaled surveys. The Normal fit works pretty well under many circumstances.

The manuscript recommends using Likert-scaled surveys to allow minimal bias

and information loss

Author Response

I appreciate that the reviewer has accurately and succinctly synopsized the main results of my research.

Recommendation

I think the topic is very important for many scientific studies that researchers

know about how to work better using Normal Likert scaled surveys. I think it

would be useful to include some recent literature that have takcled to model

customer preferences from a different perspective to avoid the shortcomings of

loss of information and biases using Likert scales. An example of paper is one

on an alternative Bayesian modelling of Likert distributions such as Reinoso-

Carvalho et al. (2020). They use discrete scale that in practice in contious by

the use of a Bayesian Logit-Normal distribution. In this sense, there has been an

interesting discussion from the Bayesian perspective and literature dealing with

such problems posed by the Likert scales and it would be interesting that the

author fills that discussion in the introduction and motivation of the manuscript

to make it more complete. Based on what was said above I recommend a revise

and resubmit.

Author Response

Thank you for these suggestions. I was not aware of (Reinoso-Carvalho et al. 2020) which is a relatively recent publication. But I lean towards Bayesian methods wherever they lend themselves to a specific problem. I have included several paragraphs in the “Conclusions and Discussion” section that summarize (Reinoso-Carvalho et al. 2020)s methodology and applications. I show how these ideas can be effectively applied in the context of my current research paper to address the problems I have raised in interpreting Likert-scaled datasets.

Reviewer #2:

The paper focuses on minimizing the bias and informativeness of Likert metrics distribution

using the Kullback-Leibler divergence, KLD, from the actual responses empirical distribution

and an assumed distribution of respondent beliefs through the information penalty.

KLD is a divergence measure, not a distance metric. Thus, the order of the two distributions

matters. KLD(Q, P) is not equal to KLD(P, Q). The use of divergence and distance is

confusing. Moreover, in the setup, it is unclear how the order of the distributions is.

Author Response

Thank you for pointing this out. I have addressed this by replacing the KLD metric with a comparable Jeffreys distance metric in the current revision of my paper (with recalculations for the tables and graphs). The Jeffreys distance is a symmetrized modification of the KLD metric, 𝐽(1,2)=𝐼(1:2)+𝐼(2:1) actually introduced prior to (Kullback and Leibler 1951) paper, in (Jeffreys and Jeffreys 1946). It is a proper distance metric that satisfies the triangle inequality. I discuss, at greater length below, the revision of my calculations around Jeffreys’ distance.

The use of the term “divergence” as statistical distances are called, has varied significantly over time, and current usage was established in (Amari and Nagaoka 2000). (Kullback and Leibler 1951) actually used “divergence” to refer to the symmetrized divergence already defined and used in (Jeffreys and Jeffreys 1946), where (Jeffreys and Jeffreys 1946) referred to this as “the mean information for discrimination … per observation” while (Lindley 1959) referred to the asymmetric function as the “directed divergence.”

Although calling KLD a “distance” is not technically wrong (it is a statistical distance in the normal usage of the term) I do think there is merit in the reviewers suggestion to use a symmetric measure for this research, as there may be a question about the ordering of the distributions in calculations. I have recalculated and restated all of the graphs, tables and results in this new revision in terms of Jeffreys divergence.

I was curious myself whether there was asymmetry in KLD versus the symmetric Jeffreys divergence for my research dataset. I took my Airline Customer Satisfaction dataset and merged the 15 columns if Likert-scaled responses into a single variable, and compared these each to a 𝑁𝑜𝑟𝑚𝑎𝑙(3,1),𝑃𝑜𝑖𝑠𝑠𝑜𝑛(3) and 𝐵𝑒𝑡𝑎(.5,.5) random variable. Summaries of linear regressions of the Jeffreys distances against KLD show little difference between Jeffreys divergence and KLD for this dataset.

##

## Call:

## lm(formula = KL_div_norm ~ j_div_norm, data = dist)

##

## Residuals:

## Min 1Q Median 3Q Max

## -2.026e-14 4.000e-19 3.100e-18 5.900e-18 2.519e-16

##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 8.129e-16 9.901e-18 8.211e+01 <2e-16 ***

## j_div_norm 1.000e+00 3.533e-17 2.830e+16 <2e-16 ***

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##

## Residual standard error: 2.577e-16 on 6186 degrees of freedom

## Multiple R-squared: 1, Adjusted R-squared: 1

## F-statistic: 8.012e+32 on 1 and 6186 DF, p-value: < 2.2e-16

Comparing Jeffreys divergence to KLD for Airline Customer Statisfaction Dataset

Comparing Jeffreys divergence to KLD for Airline Customer Statisfaction Dataset

##

## Call:

## lm(formula = KL_div_pois ~ j_div_pois, data = dist)

##

## Residuals:

## Min 1Q Median 3Q Max

## -1.121e-14 -1.600e-18 1.700e-18 5.000e-18 1.377e-16

##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 3.613e-16 4.482e-18 8.061e+01 <2e-16 ***

## j_div_pois 1.000e+00 1.479e-17 6.760e+16 <2e-16 ***

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##

## Residual standard error: 1.428e-16 on 6186 degrees of freedom

## Multiple R-squared: 1, Adjusted R-squared: 1

## F-statistic: 4.57e+33 on 1 and 6186 DF, p-value: < 2.2e-16

Comparing Jeffreys divergence to KLD for Airline Customer Statisfaction Dataset

Comparing Jeffreys divergence to KLD for Airline Customer Statisfaction Dataset

##

## Call:

## lm(formula = KL_div_beta ~ j_div_beta, data = dist)

##

## Residuals:

## Min 1Q Median 3Q Max

## -1.496e-16 -7.100e-18 -1.900e-18 3.900e-18 1.009e-14

##

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 2.710e-16 4.211e-18 6.435e+01 <2e-16 ***

## j_div_beta 1.000e+00 8.445e-18 1.184e+17 <2e-16 ***

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##

## Residual standard error: 1.29e-16 on 6186 degrees of freedom

## Multiple R-squared: 1, Adjusted R-squared: 1

## F-statistic: 1.402e+34 on 1 and 6186 DF, p-value: < 2.2e-16

Comparing Jeffreys divergence to KLD for Airline Customer Statisfaction Dataset

Comparing Jeffreys divergence to KLD for Airline Customer Statisfaction Dataset

The assumed distributions used in the paper are standard for the beliefs. However, in real

applications, the implementation of much more complex distributions is appropriate and

natural. For example, a mixture of distributions, zero-inflated distributions, among others.

Author Response

I agree with the reviewer’s assertion “… assumed distributions used in the paper are standard for the beliefs” and indeed the fact that such assumptions may be inappropriate for a given dataset is one of the main motivations of this research. Mixture and zero-inflated distributions of the observations (versus theoretical assumptions) are, indeed quite common, if not the norm in Likert-scaled datasets. They come about because of, among other things, tribalism (mixture distributions) and belief polarization (zero-inflated distributions). Though they may exist, I haven’t seen a study that interprets Likert-scaled observations with anything but a Gaussian assumption (not that these don’t exist; I just haven’t come across one). One reason for this may be that a major portion of survey research is conducted by researchers who rely (blindly perhaps) on accessible statistical software. Indeed, complex causal chains are commonly addressed via structural equation model or regression model software that rely on an underlying assumptions of Gaussian data.

Polarized responses are quite common in research on political and social issues in North America, and thus the problem addressed is an important one. They may be modeled as zero-inflated Gaussian or Poisson distributions, though too often no accommodations are made at all for the analysis for the zero-inflated data. Survey bias towards boundary-inflated responses among polled Americans, and midpoint-inflated responses among Asians have been repeatedly documented and called out as a challenge to survey based research (e.g., see (Lee et al. 2002), (Grandy 1996) and (Wang et al. 2008)) with the Pew Research Center (Gao 2016) describing such bias as a major challenge to democracy and a consistent problem in their surveys.

How should we interpret Likert-scaled data when we have strong evidence that its distribution is zero-inflated, or otherwise highly non-Gaussian? (Reinoso-Carvalho et al. 2020) have successfully applied a Bayesian multi-response version of the multivariate logit-normal regression model to interpret complex sensory data. I think (Reinoso-Carvalho et al. 2020)’s methodology offers a generalized, though more complex and computationally intensive, method for extracting relevant information from Likert datasets that are highly non-Gaussian.

Cross-sensory response research in (Reinoso-Carvalho et al. 2020), specifically studies in the human taste response to music, has pioneered Bayesian alternatives to frequentist analysis of Likert-scaled data. In (Reinoso-Carvalho et al. 2020) a sample of 1611 participants tasted one sample of chocolate while listening to a song that evoked a specific combination of cross-modal and emotional consequences. The researchers addressed difficulties in interpreting frequentist statistical tests using discrete, categorical responses by applying a Bayesian model to quantify the information content of a response. The approach used in (Reinoso-Carvalho et al. 2020) is well suited to sentiment analysis problems that have long been analyzed using structural equation models and frequentest Neyman-Pearson hypothesis tests (Westland 2019), (Sarstedt and Ringle 2020).

Data collected for (Reinoso-Carvalho et al. 2020) study showed strong non-symmetric behavior among the bounded scales, with large numbers of respondents selecting extreme values close to the boundaries, which contradicted the assumptions of traditional multivariate regression approaches to analysis, because residuals could not be Gaussian distributed were responses at the boundaries of the response space.

In order to overcome this problem (Reinoso-Carvalho et al. 2020) remapped each outcome 𝑗 for each individual 𝑖 into a unit (0,1) range. They then used Bayesian, multi-response, multivariate, logit-normal distribution with outcome-specific intercepts and slopes, and common covariance structure across outcome measures, following the methodology in (Lesaffre, Rizopoulos, and Tsonaka 2007). The logit-normal distribution can take a variety of shapes, e.g., U-shapes and J-shapes. More importantly, they are designed to specifically address the zero-inflated data distributions that arise in particularly polarized survey responses.

A Bayesian multi-response version of the multivariate logit-normal regression model was used in (Reinoso-Carvalho et al. 2020). Outcome-specific intercepts and slopes were needed since the association of each co-variate with each of the responses could significantly differ. They also take advantage of the inherent high-correlation of responses due to individual consistency in responses, and to social and cultural clustering of beliefs (and responses) in survey data, through joint modeling of all the outcomes, allowing the borrowing of information between responses.

Their Bayesian multi-response version of the multivariate logit-normal regression model provides a flexible, scalable and adaptive model where reliance on the central limit theorem can be questionable. Additionally, where available, they provide a natural form to incorporate any prior information available, either from prior studies or from expert opinion. The transformations specified in (Reinoso-Carvalho et al. 2020) result in a model error term (representing features not captured by the data) which is multi-Normal, allowing for analysis with available statistical software.

Under the complexity issue, the Hessian matrix is singular, which leads to the impossibility

of measuring the information penalty. Therefore, in this situation, the solution requires the

application of different methods.

Author Response

It is important not to confuse the theoretical and observed Fisher Information matrices. Fisher information measures the amount of information that an observable random variable carries about an unknown parameter of a distribution that models that random variable. Formally, it is the variance of the score, or the expected value of the observed information. It always exists because it is based on actual measurements. It can be derived as the Hessian of the relative entropy.

It is essential to differentiate between the theoretical and observed Fisher Information matrices. The negative Hessian evaluated at the MLE corresponds to the observed Fisher information matrix evaluated at the MLE. It is incorrect to say that the observed Fisher information can be found by inverting the (negative) Hessian. The inverse of the (negative) Hessian is an estimator of the asymptotic theoretical covariance matrix, and the square roots of the diagonal elements are estimators of the standard errors. The theoretical Fisher information matrix is based on the Fisher information metric theorem which proves that KL-divergence is directly related to the Fisher information metric, with the Fisher Information Matrix being the Hessian of Kullback–Leibler divergence (KLD) between two (ideal) distributions.

Formally, let 𝑙(𝜃) be a log-likelihood function and theoretical Fisher information matrix 𝐼(𝜃) be a symmetrical (𝑝×𝑝) matrix containing the entries 𝐼(𝜃) = - ∂2∂𝜃𝑖∂𝜃𝑗𝑙(𝜃) for 1≤𝑖,𝑗≤𝑝

The Hessian is defined as 𝐻(𝜃) = ∂2∂𝜃𝑖∂𝜃𝑗𝑙(𝜃) for 1≤𝑖,𝑗≤𝑝 and is the matrix of second derivatives of the likelihood function with respect to the parameters. It follows that if you minimize the negative log-likelihood, the returned Hessian is the equivalent of the observed Fisher information matrix whereas in the case that you maximize the log-likelihood, then the negative Hessian is the observed information matrix.

The observed Fisher information matrix is 𝐼(𝜃̂ 𝑀𝐿) the information matrix evaluated at the maximum likelihood estimates (MLE). The second derivative of the log-likelihood evaluated at the maximum likelihood estimates (MLE) is the observed Fisher information (Pawitan 2001) . This is exactly what the optimization algorithms used in this research, like optim in R, return: the Hessian evaluated at the MLE. When the negative log-likelihood is minimized, the negative Hessian is returned. The estimated standard errors of the MLE are the square roots of the diagonal elements of the inverse of the observed Fisher information matrix. That is, the square roots of the diagonal elements of the inverse of the Hessian (or the negative Hessian) are the estimated standard errors.

The inverse of the Fisher information matrix is an estimator of the asymptotic covariance matrix 𝑉𝑎𝑟(𝜃̂ 𝑀𝐿)=[𝐼(𝜃̂ 𝑀𝐿)]−1 and the standard errors are then the square roots of the diagonal elements of the covariance matrix.

The main reason to be concerned with singularities in computing Fisher Information has to do with the asymptotics – a singularity implies that the usual (√𝑛)(𝜃̂ −𝜃)−→𝐷𝑁[0,𝐼(𝜃̂ )−1] is not valid. Alternative formulations are provided in (Rotnitzky et al. 2000) and give the generalized asymptotic distributions, dependent on a parameter 𝑠 and its parity (odd/even), where 2𝑠+1 is the number of derivatives of the likelihood. (Rotnitzky et al. 2000) provides a unified theory for deriving the asymptotic distribution of the MLE and of the likelihood ratio test statistic when the information matrix has rank one less than full and the likelihood is differentiable up to a specific order. This is important since the likelihood ratio test uses the asymptotic distribution.

It is unclear how is designed the strategy to set optimally the hyper-parameters of the ideal

belief distribution.

Author Response

I was not too sure to which part of the paper the reviewer was specifically referring, but I believe it was to the illustrative examples in section 2. My intention is to show what biases and information losses can possibly be introduced into the standard methods of interpreting

Section 2 was intended to be illustrative, and the parameter settings were designed to illustrate potentially extreme situations that could occur in the data. It wasn’t really my intention to explore a parameter space, as say one might do in optimization or machine learning. Instead, the Airline database provides a real world benchmark for parameters that highlights the situations most likely to be encountered in practice.

I have made this clear in the current revision of my paper.

The empirical application is outstanding and informative, from which the minimal information

loss found is under Normal beliefs in Likert scaled surveys.

Author Response

Thank you for the kind words. I hope that this revision clears up any remaining questions the reviewers may have.

References

Amari, Shun-ichi, and Hiroshi Nagaoka. 2000. Methods of Information Geometry. Vol. 191. American Mathematical Soc.

Gao, George. 2016. “The Challenges of Polling Asian Americans.”

Grandy, Jerilee. 1996. “Differences in the Survey Responses of Asian American and White Science and Engineering Students.” ETS Research Report Series 1996 (2): i–23.

Jeffreys, H, and BS Jeffreys. 1946. “Methods of Mathematical Physics, Cambridge, 192.”

Kullback, Solomon, and Richard A Leibler. 1951. “On Information and Sufficiency.” The Annals of Mathematical Statistics 22 (1): 79–86.

Lee, Jerry W, Patricia S Jones, Yoshimitsu Mineyama, and Xinwei Esther Zhang. 2002. “Cultural Differences in Responses to a Likert Scale.” Research in Nursing & Health 25 (4): 295–306.

Lesaffre, Emmanuel, Dimitris Rizopoulos, and Roula Tsonaka. 2007. “The Logistic Transform for Bounded Outcome Scores.” Biostatistics 8 (1): 72–85.

Lindley, DV. 1959. Taylor & Francis.

Pawitan, Yudi. 2001. In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press.

Reinoso-Carvalho, Felipe, Laura H Gunn, Enrique ter Horst, and Charles Spence. 2020. “Blending Emotions and Cross-Modality in Sonic Seasoning: Towards Greater Applicability in the Design of Multisensory Food Experiences.” Foods 9 (12): 1876.

Rotnitzky, Andrea, David R Cox, Matteo Bottai, and James Robins. 2000. “Likelihood-Based Inference with Singular Information Matrix.” Bernoulli, 243–84.

Sarstedt, Marko, and Christian M Ringle. 2020. “Structural Equation Models: From Paths to Networks (Westland 2019).” Springer.

Wang, Rui, Brian Hempton, John P Dugan, and Susan R Komives. 2008. “Cultural Differences: Why Do Asians Avoid Extreme Responses?” Survey Practice 1 (3): 2913.

Westland, J Christopher. 2019. Structural Equation Models. 2nd Edition. Springer.

Attachment

Submitted filename: PLOS_likert_response.pdf

Decision Letter 1

Carlos Andres Trujillo

12 Jul 2022

Information Loss and Bias in Likert Survey Responses

PONE-D-21-19317R1

Dear Dr. Westland,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Carlos Andres Trujillo, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The current manuscript has addressed major methodological issues and made it more transparent. I have enjoyed reading this final version of the manuscript.

Reviewer #2: The article provides insight into the Likert scaled response survey analysis by implementing rigorously appropriate statistical methods. It found support for the minimal information loss under the Normal beliefs assumption. The article is well written, easy to understand, and useful for decision-making. The link to prior works helps to make the author's arguments clear. Additionally, the paper promotes new questions and procedures.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Enrique ter Horst

Reviewer #2: Yes: Hector Zarate

**********

Attachment

Submitted filename: review.pdf

Acceptance letter

Carlos Andres Trujillo

18 Jul 2022

PONE-D-21-19317R1

Information Loss and Bias in Likert Survey Responses

Dear Dr. Westland:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Carlos Andres Trujillo

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Rebuttal-Letter-for-PONE-D-21-01685.pdf

    Attachment

    Submitted filename: review.pdf

    Attachment

    Submitted filename: PLOS_likert_response.pdf

    Attachment

    Submitted filename: review.pdf

    Data Availability Statement

    Data was downloaded on June 6, 2021 from https://www.kaggle.com/sjleshrac/airlines-customer-satisfaction.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES