DISCRETE CHOICE AND RATIONAL INATTENTION: A GENERAL EQUIVALENCE RESULT

Mogens Fosgerau; Emerson Melo; André de Palma; Matthew Shum

doi:10.1111/iere.12469

. 2020 Jun 16;61(4):1569–1589. doi: 10.1111/iere.12469

DISCRETE CHOICE AND RATIONAL INATTENTION: A GENERAL EQUIVALENCE RESULT

Mogens Fosgerau ^1,^✉, Emerson Melo ², André de Palma ^3,⁴, Matthew Shum ^5,^†

PMCID: PMC7687267 PMID: 33288966

Abstract

This article establishes a general equivalence between discrete choice and rational inattention models. Matějka and McKay (2015) showed that when information costs are modeled using the Shannon entropy, the choice probabilities in the rational inattention (RI) model take the multinomial logit form. We show that, for one given prior over states, RI choice probabilities may take the form of any additive random utility discrete choice model (ARUM) when the information cost is a Bregman information, a class defined in this article. The prior information of the rationally inattentive agent is summarized in a constant vector of utilities in the corresponding ARUM.

1. INTRODUCTION

In many situations where agents make decisions under uncertainty, information acquisition is costly (involving pecuniary, time, or psychological costs); therefore, agents may rationally choose to remain imperfectly informed about the available options. This idea underlies the theory of rational inattention (RI), which has become an important paradigm for modeling boundedly rational behavior in many areas of economics (Sims, 2003, 2010). In this article, our main contribution is to establish a general equivalence between additive random utility discrete choice and RI models. Matějka and McKay (2015) showed that when information costs are modeled using the Shannon mutual information between actions and states, the resulting choice probabilities in the RI model take the familiar multinomial logit form, leading to the “RI‐logit” model, as we will refer to it below. This is a very appealing result, providing a microfoundation as well as alternative interpretation for the multinomial logit model.

However, the RI‐logit model has the “independence of irrelevant alternatives” (IIA) property, which states that, in a given state, the ratio of the choice probabilities of two alternatives does not depend on the utility of a third (irrelevant) alternative.² In many empirical contexts, the IIA property implies restrictive and unrealistic substitution patterns among the choice options, as illustrated in the following example:

Example 1

Consider a rationally inattentive consumer facing a choice between pineapple (good 1), mango (good 2), and cheesecake (good 3). A priori, the consumer does not know the value associated with each good but has fixed prior beliefs about the possible realizations of the valuation vector $V$ . Assume that $V$ has four equally likely possible states:

$\begin{matrix} (v^{1}, v^{2}, v^{3}, v^{4}) = (\begin{matrix} 0 & 0.1 & 0 & 0 \\ 0 & 0 & 0.1 & 0 \\ 0 & 0 & 0 & 0.1 \end{matrix}), \end{matrix}$

and assume that we observe corresponding choice probabilities $p (v^{1}) = (0.41, 0.41, 0.18)$ and $p (v^{2}) = (0.46, 0.37, 0.17)$ for the first two states.³ These probabilities reflect a situation where an increase (from 0 to 0.1) in the value of good 1, the pineapple, causes consumers to substitute disproportionately from the mango instead of the cheesecake. However, such outcomes violate the IIA property, as the choice probability for mango decreases by 10%, whereas the choice probability for cheesecake decreases only by 4%; hence, they cannot arise from the RI‐logit model.

The root of the problem in the previous example is that the Shannon mutual information embodies an important and strong assumption of symmetry: the Shannon entropy is invariant to permutation in its arguments; therefore, reordering the choice options does not affect the information cost. This makes the cost of processing information context independent (Hobson, 1969), and hence, it cannot take into account that some choice options are more similar than other options.⁴

In this article, we introduce a new class of generalized entropies that allows us to define cost functions that embody information related to the identity of alternatives. Formally, our generalized entropy is defined as the negative convex conjugate of the surplus function of an additive random utility model (ARUM), and allows patterns such as those in the example above to be accommodated in an RI model. The new generalized entropies are not required to be symmetric. This contributes to making RI models empirically relevant. In fact, we show that, depending on the choice of information cost, an RI model can yield the same choice probability system as any ARUM; this includes specifications such as nested logit, multinomial probit, and so on, which are often employed in empirical work.

Based on our definition of generalized entropy, we introduce a class of generalized RI models. In particular, we define a general class of information cost functions where the Shannon entropy is replaced by our generalized entropy. This generalizes the Shannon mutual information since this cost function arises when the ARUM is the multinomial logit model. Because of this connection with the ARUM class, we label our general RI model as “RI‐ARUM.” As we will show, an RI‐ARUM model exists corresponding to any ARUM, implying that rationally inattentive behavior can lead to choice probabilities that violate the IIA property, as in the above example.

1.1. Related Literature

Besides the papers already mentioned above, the main equivalence result in this article is related to several strands of the literature. This article contributes to the growing literature on RI with more general cost functions. Hébert and Woodford (2017) provide a foundation for the RI model based on a dynamic information accumulation process. In particular, they introduce a class of “neighborhood‐cost” functions, which allows them to reflect varying similarity of states to one another. Morris and Yang (2019) use ideas from global games to develop an RI framework, in which it is more difficult for players to distinguish between nearby states. Our results complement these, but instead of allowing cost functions to reflect that some states are more similar than others, we introduce cost functions that may reflect that some choice options are more similar than others that, as illustrated in the example above, are relevant for the empirical implications of the RI model.

Caplin et al. (2017) and Frankel and Kamenica (2019) study properties that may be required of information cost functions. We propose to use Bregman information (Banerjee et al., 2005) based on generalized entropy.

Our results also relate to the literature on perturbed utility models. Anderson et al. (1988) derived the representative consumer model underlying the logit model, showing that the direct utility has an entropy form. This observation was generalized by Hofbauer and Sandholm (2002), who showed that the choice probabilities generated by any ARUM can be derived from a deterministic model based on payoff perturbations that depend nonlinearly on the vector of choice probabilities. Fosgerau and McFadden (2012) provide a foundation for applications of consumer theory to perturbed utility problems with nonlinear budget constraints. Fudenberg et al. (2015) provide an axiomatic characterization of a class of perturbed random utility models. Allen and Rehbeck (2019) consider identification. Fosgerau et al. (2019) construct a class of inverse demand models that are useful for estimating demand for differentiated products using Berry's (1994) method. We contribute to that literature in two ways. First, we provide an explicit characterization of the perturbation term corresponding to general ARUM. Second, our equivalence result allows us to interpret the class of perturbed random utility models in terms of RI arguments.

Finally, the RI framework has also inspired some recent empirical work; see Caplin et al. (2016), Joo (2019), Brown and Jeon (2019), and Porcher (2019). These papers primarily utilize the Shannon/multinomial logit framework. The results in this article may enable researchers to apply RI models far more general than the multinomial logit model, as they show that choice behavior emerging from any ARUM model may emerge from rationally inattentive behavior.

1.2. Layout

Section 2 introduces the RI model. Section 3 introduces the ARUM framework, and uses convex analysis to generate some insights into the fundamental structure of these models. Using this structure, we introduce a class of generalized entropies and present a few key results. Section 4 shows how generalized entropy can be used to define the information cost in the RI model, leading to the class of RI‐ARUM models. Then we present the key result from this article, which establishes the equivalence between choice probabilities emerging from the discrete choice model, and those emerging from RI‐ARUM models. Section 5 discusses the specific case of the nested logit model, which has proven useful in many empirical models. We show how rationally inattentive behavior can generate choice probabilities with substitution patterns that violate IIA, as in the above example. Two examples demonstrate some properties of the RI‐nested logit. Section 6 concludes. All proofs are in the Appendix.

1.3. Notation

Throughout this article, for vectors $a$ and $b$ , $a \cdot b$ denotes the vector scalar product $\sum_{i} a_{i} b_{i}$ , such that, for example, $1 \cdot q = \sum_{i} q_{i}$ . A univariate function applied to a vector is understood as coordinate‐wise application of the function, for example, $e^{q} = (e^{q_{1}}, \dots, e^{q_{N}})$ . Consequently, $a + q = (a + q_{1}, \dots, a + q_{J})$ for scalar a. The gradient with respect to a vector $v$ is $\nabla_{v}$ ; for example, for $v = (v_{1}, \dots, v_{N})$ , $\nabla_{v} W (v) = (\frac{\partial W (v)}{\partial v_{1}}, \dots, \frac{\partial W (v)}{\partial v_{N}})$ . The unit simplex in $R^{N}$ is Δ.

2. RATIONAL INATTENTION

We introduce the RI model. The DM is presented with a group of N options, from which he must choose one. Each option has an associated payoff $v = (v_{1}, \dots, v_{N})$ , but the vector of payoffs is unobserved by the DM. Instead, the DM considers the payoff vector $V$ to be random, taking values in a set $V \subset R^{N}$ ; for simplicity, we take $V$ to be finite. The DM possesses some prior knowledge about the available options, given by a probability measure μ, where $μ (v) = P (V = v) > 0$ for all $v \in V$ .

The DM's choice is an action $i \in {1, \dots, N}$ and we write $p_{i} (v)$ as shorthand for the conditional probability that the action is i in state $V = v$ . The payoff resulting from action i is $V_{i}$ . The vector of choice probabilities conditional on $V = v$ is then $p (v) = (p_{1} (v), \dots, p_{N} (v))$ , and $p (\cdot) = {p (v)}_{v \in V}$ is the collection of conditional probabilities. Given the conditional probabilities $p (\cdot)$ and the prior μ, it is convenient to have notation for the unconditional choice probabilities and we let $p_{i}^{0} = E p_{i} (V) = \sum_{v \in V} p_{i} (v) μ (v)$ and $p^{0} = (p_{1}^{0}, \dots, p_{N}^{0})$ .

The problem of the rationally inattentive DM is to choose the conditional distribution $p (\cdot)$ , balancing the expected payoff against the cost of information. The DM's strategy is a solution to the following problem:

\max_{p (\cdot)} \{E (V \cdot p (V)) - Information Cost\} .

(1)

2.1. The RI‐Logit Model: The Matějka and McKay (2015) Result

The key element in program (1) is the choice of the information cost function. In specifying information costs, researchers have used concepts from information theory (Cover and Thomas, 2006). Specifically, much of the existing literature (Sims, 2003; Matějka and McKay, 2015) has utilized the mutual Shannon information between payoffs and actions to measure the information costs;⁵ that is, letting $Ω (q) = - q \cdot \log q$ denote the Shannon entropy, the information cost is specified as

\begin{matrix} κ (p (\cdot), μ) & \equiv & Ω (E (p (V))) - E (Ω (p (V))) \\ = & - \sum_{i = 1}^{N} p_{i}^{0} \log p_{i}^{0} + \sum_{v \in V} (\sum_{i = 1}^{N} p_{i} (v) \log p_{i} (v)) μ (v) . \end{matrix}

(2)

Plugging this into (1), the rationally inattentive DM chooses the system of conditional probabilities $p (\cdot)$ to optimize⁶

\begin{matrix} \max_{p (\cdot)} \{E (V \cdot p (V)) - κ (p (\cdot), μ)\} \\ = \max_{p (\cdot)} \{\sum_{v \in V} \{p (v) \cdot [v + \log p^{0}] - p (v) \cdot \log p (v)\} μ (v)\} \end{matrix}

(3)

subject to

p_{i} (v) \geq 0 for all i, \sum_{i = 1}^{N} p_{i} (v) = 1 .

(4)

Solving this, the DM finds conditional choice probabilities

p_{i} (v) = \frac{p_{i}^{0} e^{v_{i}}}{\sum_{j = 1}^{N} p_{j}^{0} e^{v_{j}}} for i = 1, \dots, N,

(5)

which satisfy $p_{i}^{0} = E p_{i} (V)$ . We may rewrite (5) as

p_{i} (v) = \frac{e^{v_{i} + \log p_{i}^{0}}}{\sum_{j = 1}^{N} e^{v_{j} + \log p_{j}^{0}}} = \frac{e^{{\tilde{v}}_{i}}}{\sum_{j = 1}^{N} e^{{\tilde{v}}_{j}}},

(6)

where ${\tilde{v}}_{i} = v_{i} + \log p_{i}^{0}$ . This may be recognized as a multinomial logit model in which the payoff vector $v$ is shifted by $\log p^{0}$ . Remarkably, the influence of the prior information μ is completely captured by this shift vector $\log p^{0}$ .

Below, we show that this equivalence between the RI model and the logit discrete choice model can be extended to the entire class of additive random utility discrete choice models, by suitably generalizing the information cost function Ω(). We will also find that a location shift vector that completely captures the influence of the prior information. Before turning to these results, we briefly review some properties of the ARUM class and establish some new results that are useful for working with generalized entropies.

3. RANDOM UTILITY MODELS AND GENERALIZED ENTROPY

Consider a DM making a utility maximizing discrete choice among a set of $i = 1, \dots, N$ options. The utility of option i is

u_{i} = v_{i} + ε_{i},

(7)

where $v = (v_{1}, \dots, v_{N})$ is deterministic and $ε = (ε_{1}, \dots, ε_{N})$ is a vector of random utility shocks. This is the classic ARUM framework pioneered by McFadden (1978). Our presentation of the ARUM framework here will emphasize convex‐analytic properties that will be important in drawing connections with the RI model in what follows.

Assumption 1

The random vector ε follows a joint distribution with finite means that is absolutely continuous, independent of $v$ , and fully supported on $R^{N}$ .

Assumption 1 leaves the distribution of the ε's unspecified, thus allowing for a wide range of choice probability systems far beyond the often used logit model. Importantly, it accommodates arbitrary correlation in the $ε_{i}$ 's across options, which is reasonable and realistic in applications.

The DM then has choice probabilities

\begin{matrix} q_{i} (v) \equiv P (v_{i} + ε_{i} = \max_{j} [v_{j} + ε_{j}]), i = 1, \dots, N . \end{matrix}

An important concept in this article is the surplus function of the discrete choice model (so named by McFadden, 1981), defined as

W (v) = E_{ε} (\max_{j} [v_{j} + ε_{j}]) .

(8)

Under Assumption 1, $W (v)$ is convex and differentiable and the choice probabilities coincide with the derivatives of $W (v)$ :⁷

\begin{matrix} \frac{\partial W (v)}{\partial v_{i}} = q_{i} (v) for i = 1, \dots, N \end{matrix}

or, using vector notation, $q (v) = \nabla W (v)$ . This is the Williams–Daly–Zachary theorem, famous in the discrete choice literature (McFadden, 1978, 1981).

From the differentiability of W and the Williams–Daly–Zachary theorem, it follows that the choice probabilities emerging from any random utility discrete choice model can be expressed in closed‐form as⁸:

q_{i} (v) = \frac{T_{i} (e^{v})}{\sum_{j = 1}^{N} T_{j} (e^{v})} for i = 1, \dots, N,

(9)

where the vector‐valued function $T (\cdot) = (T_{1} (\cdot), \dots, T_{N} (\cdot)) : R_{+}^{N} \mapsto R_{+}^{N}$ is defined as the gradient of the exponentiated surplus, that is,

T (e^{v}) = \nabla_{v} (e^{W (v)}) .

(10)

For the specific case of multinomial logit, the $ε_{i}$ 's are i.i.d. across options i with a type 1 extreme value distribution, the surplus function is $W (v) = \log (\sum_{i = 1}^{N} e^{v_{i}})$ , implying that $T_{i} (e^{v}) = e^{v_{i}}$ . Thus, Equation (9) becomes the familiar multinomial logit choice formula: $q_{i} (v) = e^{v_{i}} / \sum_{j} e^{v_{j}}$ .

Based on (9), we may refer to $T$ as the scaled demand mapping. We will use the inverse of the scaled demand mapping to construct an information cost. The inverse must allow the zero demands that arise in the RI model. Existence of such an inverse is established by the following proposition:

Proposition 1

(Invertibility) Let Assumption 1 hold. Then the function $T (\cdot)$ has a continuous extension to $T : R_{+ 0}^{N} \to R_{+ 0}^{N}$ that is surjective, injective, and hence globally invertible. Moreover, the function $S (\cdot)$ defined as $S (\cdot) = T^{- 1} (\cdot)$ satisfies $S_{i} (q) = 0$ iff $q_{i} = 0$ for $i = 1, \dots, N$ .

In what follows we refer to $S$ as the inverse scaled demand mapping. For any discrete choice model, there is a close relationship between the corresponding inverse scaled demand ( $S$ ) and surplus (W) functions. They are related in terms of convex conjugate duality. Since the social surplus function W for any ARUM is convex, we know that there exists a convex conjugate function $W^{:}$ satisfying the proble⁹

W (v) = \max_{q \in Δ} \{q \cdot v - W^{:} (q)\},

(11)

where the maximum on the right‐hand side is attained at $q (v) = \nabla W (v)$ .

The next proposition establishes a specific structure of the surplus function W and its convex conjugate $W^{:}$ .¹⁰

Proposition 2

(Generalized entropy functions) Consider an ARUM discrete choice model satisfying Assumption 1, with surplus function $W (\cdot)$ . Then

(i)
The surplus function $W (v)$ is equal to
$W (v) = \log (\sum_{i = 1}^{N} T_{i} (e^{v}))$ (12)
for the vector‐valued function $T$ as defined in Equation (10).

(ii)
The convex conjugate of the surplus function $W (v)$ is
$W^{:} (q) = \{\begin{matrix} q \cdot \log S (q) & q \in Δ \\ + \infty & otherwise, \end{matrix}$ (13)
where $S (\cdot)$ is the inverse mapping for the $T$ function in Equation (10). We call the negative convex conjugate $- W^{:} (\cdot)$ a generalized entropy.

Remark 1

(RI‐logit revisited.) To see how this works in a special case, let us consider the multinomial logit model. In this case, $T$ is the identity, implying that its inverse, $S (q) = q$ , is also just the identity. Then by Proposition 2(ii), the negative convex conjugate of the surplus function is $- W^{:} (q) = - q \cdot \log q = - \sum_{i} q_{i} \log q_{i}$ , which is just the Shannon (1948) entropy.

Moreover, we see that Equation (13) implies that the RI‐logit optimization problem (3), written as

$\begin{matrix} \max_{p (\cdot)} \sum_{v \in V} \{p (v) \cdot [v + \log p^{0}] - W^{:} (p (v))\} μ (v), \end{matrix}$

has the multinomial logit choice probabilities in Equation (6) as solution.

Proposition 2(i) generalizes the “logsum” formula for the multinomial logit model to the entire class ARUM. Similarly, generalizing from the logit case, $- W^{:}$ , the negative convex conjugate of the surplus W of any ARUM may be viewed as a generalized entropy. In particular, Proposition 2(ii) shows how the generalized entropy may be expressed in terms of the inverse scaled demand $S$ as $- W^{:} (q) = - q \cdot \log S (q)$ .

To aid further interpretation of the generalized entropy function, note that Equation (8) implies that the surplus function can be written as

\begin{matrix} W (v) = \sum_{i = 1}^{N} q_{i} (v) (v_{i} + E (ε_{i} | u_{i} \geq u_{j}, j \neq i)) . \end{matrix}

Combining this with (11), we obtain an alternative expression for the generalized entropy, as a choice probability‐weighted sum of expectations of the utility shocks $ε$ :¹¹

\begin{matrix} - W^{:} (q) = \sum_{i} q_{i} E [ε_{i} | u_{i} \geq u_{j}, j \neq i] . \end{matrix}

In this way, different distributions for the utility shocks $ε$ in the random utility model will imply different generalized entropies.

We may also interpret $- \log S (q)$ as follows: Given $q$ in the interior of the simplex, there exists $v$ such that $(q, v)$ satisfy (9), see Norets and Takahashi (2013). Then, using (12),¹²

\begin{matrix} - \log S_{j} (q) = W (v) - v_{j}, \end{matrix}

which means that $- \log S_{j} (q)$ is the expected utility gain from the discrete choice relative to the deterministic utility component of option j. This coincides with the $ψ_{j} (\cdot)$ mapping introduced in Lemma 1 of Arcidiacono and Miller (2011), which is a key component for the estimation procedures developed in that article.

Proposition A1 in the Appendix contains important mathematical properties of the class of inverse scaled demand functions $S$ , which are used in proving the key propositions in the remainder of the article.

4. GENERALIZING THE RI‐LOGIT MODEL: RI‐ARUM MODELS WITH BREGMAN INFORMATION COST

In this section, we generalize the equivalence result between RI and multinomial logit. We begin by generalizing the RI framework described in Section 2, using generalized entropy in place of the Shannon entropy. Specifically, we let $S$ be the inverse scaled demand corresponding to some ARUM satisfying Assumption 1, and define $Ω_{S} (p) = - p \cdot \log S (p)$ as the corresponding generalized entropy.

For a strictly convex function f, the Bregman (1967) divergence associated with f is a function of probability vectors $(p, q)$ to the real line defined by $D_{f} (p | | q) \equiv f (p) - f (q) - \nabla f (q) \cdot (p - q)$ . It measures the vertical distance from $f (p)$ to a tangent hyperplane to f at the point $q$ . By convexity of f, this distance is positive and increasing away from $q$ . When $p = p (V)$ is random, we can consider the expected Bregman divergence $E D_{f} (p (V) | | q)$ , which measures the expected divergence of $p (V)$ from $q$ . Banerjee et al. (2005) show that $E D_{f} (p (V) | | q)$ is minimized at $q = E p (V) = p^{0}$ for any choice of f. Banerjee et al. (2005) use this observation to define the Bregman information of $p (V)$ as $E D_{f} (p (V) | | p^{0})$ , noting that this is the expected distortion, as measured by the Bregman divergence, when replacing a random $p (V)$ by the optimal constant vector $p^{0}$ .

We define accordingly an information cost as the Bregman information associated with the negative of the generalized entropy $Ω_{S}$ , that is, $κ_{S} (p (\cdot), μ) = E D_{- Ω_{S}} (p (V) | | p^{0})$ .¹³ Proposition 3 below establishes that

\begin{matrix} κ_{S} (p (\cdot), μ) & = & Ω_{S} (p^{0}) - E Ω_{S} (p (V)) \\ = & - p^{0} \cdot \log S (p^{0}) + \sum_{v \in V} [p (v) \cdot \log S (p (v))] μ (v) . \end{matrix}

(14)

In particular, the information cost $κ (p (\cdot), μ)$ is equal to the Bregman information, associated with the (negative) Shannon entropy, which is well known as the mutual (Shannon) information. The interpretation of our information cost $κ_{S}$ is analogous to the information cost for the RI‐logit model, in Equation (2) above. It measures consumers' action adjustment costs associated with shifting behavior from the state‐independent unconditional choice probabilities $p_{0}$ to the state‐dependent conditional choice probabilities $p (v)$ .¹⁴ Taking information cost $κ_{S}$ as Bregman information means that the information cost inherits properties of the Bregman divergence, as stated in the next proposition.

Proposition 3

For any generalized entropy function $Ω_{S}$ , the information cost $κ_{S} (p (\cdot), μ)$ in (14) is the expectation of the Bregman divergence associated with $Ω_{S}$ of $p (\cdot)$ and $p^{0}$ . Hence, it is convex in $p (\cdot)$ when holding $p^{0}$ constant and $κ_{S} (p (\cdot), μ) = 0$ if action and state are independent.

The information cost $κ_{S}$ takes context into account by construction; that is, going back to Example 1, exchanging the labels of good 1 (pineapple) and good 3 (cheesecake) affects information costs and therefore choices by design. Allowing the information cost function to depend on context in this way entails some loss of generality, as it need not satisfy Blackwell's information ordering. In particular, $S$ is not invariant with respect to permutation in its arguments. Example 5.2 below illustrates this for the case of $κ_{S}$ corresponding to a nested logit model.

Using the generalized cost function $κ_{S}$ just introduced, we now define a new RI model describing a DM who chooses the collection of conditional probabilities $p (\cdot) = {p (v)}_{v \in V}$ to maximize his expected payoff less the general information cost

\begin{matrix} \max_{p (\cdot)} \{E (V \cdot p (V)) - κ_{S} (p (\cdot), μ)\} \\ = \max_{p (\cdot)} \{\sum_{v \in V} \{p (v) \cdot [v + \log S (p^{0})] - p (v) \cdot \log S (p (v))\} μ (v)\} . \end{matrix}

(15)

We refer this model as RI‐ARUM to make explicit the fact that the cost function (14) is defined in terms of the generalized entropy $Ω_{S} (q)$ that is derived from an ARUM. The maximization problem in Equation (15) is an extension of the maximization problem in Equation (11) that is representative for the ARUM. In fact, holding $p^{0}$ fixed , the RI‐ARUM objective function (15) above coincides, pointwise in $v$ , with the problem (11), where the generalized entropy function is $W^{:} (p) = - p \cdot \log S (p)$ as stated in Proposition 2. This connection underlies the finding, elaborated in the following proposition, that the optimal conditional choice probabilities for any RI‐ARUM model have the logit‐like closed form from Equation (9) above.

Proposition 4

Let T be the scaled demand of an ARUM and let $S = T^{- 1}$ be the inverse scaled demand. Let $p (\cdot)$ be the solution to the corresponding RI‐ARUM model and $p^{0} = E p (V)$ . Then

(i)
The unconditional probabilities satisfy the fixed point equation
$p^{0} = E (\frac{T (e^{V + \log S (p^{0})})}{\sum_{j = 1}^{N} T_{j} (e^{V + \log S (p^{0})})}) .$ (16)

(ii)
The conditional probabilities are given in terms of the unconditional probabilities by
$p (v) = \frac{T (e^{v + \log S (p^{0})})}{\sum_{j = 1}^{N} T_{j} (e^{v + \log S (p^{0})})} .$ (17)

(iii)
The optimized value of (15) is
$E \log \sum_{j = 1}^{N} T_{j} (e^{V + \log S (p^{0})}) = E W (V + \log S (p^{0})) .$ (18)

The unconditional and conditional choice probabilities in (16) and (17) generalize the corresponding expressions for the RI‐logit model in a straightforward way. We note in particular that the influence of the prior information is captured completely by the vector $\log S (p^{0})$ . The implications of prior for the behavior of an RI‐ARUM agent can be summarized by a vector in $R^{N}$ where N is the number of choice options. This is true regardless of the form of the prior beliefs.

Part (i) of the proposition shows that the solution of the RI‐ARUM model involves a fixed point problem; in what follows, we assume that a solution exists. In general, the uniqueness of a solution to RI‐ARUM is not guaranteed. By Cover and Thomas (2006, Theorem 2.7.4), the mutual (Shannon) information $κ (p (\cdot), μ)$ is convex as a function of the conditional probability $p (\cdot)$ , and strictly convex when the conditional probabilities differ from the unconditional probability $p^{0}$ . In this case, the equations in Proposition 4 uniquely identify the solution to the RI‐ARUM model. By extension, as we show in Proposition A2 in the Appendix, this conclusion also applies to the nested logit model. Whether it applies to all RI‐ARUM remains an open question.

Proposition 4(i) also implies that an important feature of the RI‐ARUM model is that some $p_{i}^{0}$ may be 0, in which case the corresponding $p_{i} (v)$ are also 0. To see this, consider Equation (17). When $p_{i}^{0} = 0$ , then Proposition 1(ii) implies that $\log S_{i} (p^{0}) = - \infty$ and hence $p_{i} (v) = 0$ . Following the literature, we refer to the set of options chosen with positive probability as the consideration set (e.g., Caplin et al., 2019).

Proposition 4(iii) indicates an alternative way for calculating the value attained by optimal rationally inattentive behavior. In particular, Equation (18) shows that the optimal value of program (15) can be computed as the expected surplus function of the appropriately shifted ARUM. This generalizes the corresponding Lemma 2 in Matějka and McKay (2015).

Although Proposition 4 does not explicitly characterize the consideration set emerging from an RI‐ARUM problem, Corollary A1 in the Appendix describes one important feature that it has, namely that it excludes options that offer the lowest utility in all states of the world.

4.1. Equivalence between Discrete Choice (ARUM) and Rational Inattention

We now establish the central result of this article, namely, the equivalence between additive random utility discrete choice models and RI models. In particular, we show that the choice probabilities generated by an RI‐ARUM lead to the same choice probabilities as a corresponding ARUM and vice versa. In particular, comparing the expressions for the choice probabilities in the RI‐ARUM model in (17) to those in an ARUM in (9), it is clear that such a result is available: the expressions for the choice probabilities are identical except for the location shift of the deterministic utility components $v$ by the vector $\log S (p^{0})$ in the RI‐ARUM model.

Proposition 5

For every RI‐ARUM with prior $(μ, V)$ , inverse scaled demand $S$ and choice probabilities $p (v)$ there is an ARUM defined on the consideration set of the RI‐ARUM that yields the same choice probabilities for all $v \in V$ .

Conversely, for every ARUM with choice probabilities $q (v)$ and inverse scaled demand $S$ and given a prior $(μ, V)$ such that the corresponding information cost $κ_{S} (p (\cdot), μ)$ is strictly convex, there is a location shift vector $c$ such that the RI‐ARUM with prior $(v \to μ (v - c), V + c)$ and inverse scaled demand $S$ has choice probabilities $p$ that satisfy $p (v - c) = q (v)$ for all $v \in V$ .

This proposition implies a new interpretation of ARUM models as describing boundedly rational behavior, which suggests that to apply an ARUM, one need not assume that DMs are completely aware of the valuations of all the available options. This is important, as it is clearly unrealistic to expect DMs to be aware of all options when the number is large.

At the same time, despite the formal equivalence in the choice probabilities in ARUM and RI‐ARUM models under the conditions of Proposition 7, there are several important differences between them. The RI‐ARUM model also allows some options to have zero unconditional choice probabilities $p_{i}^{0}$ . Since choice probabilities are necessarily positive in the ARUM under Assumption 1, the equivalence is defined only for the options that are in the consideration set of the RI‐ARUM model, as is clear from Proposition 5.

Moreover, the equivalence requires fixing the prior over states $(μ, V)$ ; a prior is part of the RI‐ARUM model but needs to be added to the random utility model. When we consider two choice scenarios with different priors, the subsequent choices in the RI‐ARUM and ARUM models can deviate considerably. In particular, the RI‐ARUM class allows options in the choice set to be complements, in the sense that increasing the payoff of an option in some state may lead to an increase in the choice probability of some other options. Such complementarities are explicitly ruled out by ARUM discrete choice models.¹⁵

Additionally, we have necessary and sufficient conditions for a system of choice probabilities to be consistent with an ARUM (Fosgerau et al., 2013). By Proposition 5, the same conditions are then necessary for a system of choice probabilities that derives from an RI‐ARUM and some fixed prior.

Starting from a given ARUM model, proceeding to the corresponding RI‐ARUM model requires deriving the convex conjugate function corresponding to the social surplus function of the given ARUM. Explicit closed forms for the convex conjugate functions are available, as far as we are aware, only for the multinomial and nested logit models. However, in general, the mass transport approach in Chiong et al. (2016) can be used to simulate the convex conjugate function for any ARUM, via computationally straightforward linear programming algorithms.

In what follows, we illustrate these features for a specific example; namely, we study a RI‐ARUM model in which the choice probabilities are equivalent to those from a nested logit discrete choice model, a frequently used model in empirical applications.

5. THE RI‐NESTED LOGIT MODEL

From an applied point of view, an important implication of Proposition 5 is that it allows us to formulate RI models that have complex substitution patterns, going beyond the multinomial logit case. In this section, we consider an RI‐ARUM model with information cost derived from a nested logit model. The nested logit choice probabilities are consistent with a discrete choice model in which the utility shocks $ε$ have a certain generalized extreme value joint distribution. Among applied researchers, the nested logit model is often preferred over the multinomial logit model because it allows some products to be closer substitutes than others, thus avoiding the restrictions implied by the IIA property.¹⁶

We partition the set of options $i \in {1, \dots, N}$ into mutually exclusive nests, and let $g_{i}$ denote the nest containing option i. Let $ζ_{g_{i}} \in (0, 1]$ be nest‐specific parameters. For a valuation vector $v$ , the nested logit choice probabilities are given by

q_{i} (v) = \frac{e^{v_{i} / ζ_{g_{i}}}}{\sum_{j \in g_{i}} e^{v_{j} / ζ_{g_{i}}}} \cdot \frac{e^{ζ_{g_{i}} \log (\sum_{j \in g_{i}} e^{v_{j} / ζ_{g_{i}}})}}{\sum_{all nests g} e^{ζ_{g} \log (\sum_{j \in g} e^{v_{j} / ζ_{g}})}} .

(19)

The inverse scaled demand $S$ corresponding to a nested logit model is

S_{i} (q) = q_{i}^{ζ_{g_{i}}} {(\sum_{j \in g_{i}} q_{j})}^{1 - ζ_{g_{i}}} .

(20)

Applying Proposition 5, the nested logit choice probabilities (19) are the same as those from an RI‐ARUM model with valuations

v_{i} - ζ_{g_{i}} \log p_{i}^{0} - (1 - ζ_{g_{i}}) \log (\sum_{j \in g_{i}} p_{j}^{0}), i \in \{1, \dots, n\} .

(21)

The inverse scaled demand $S$ for the nested logit model in Equation (20) has several interesting features, relative to the Shannon entropy. First, Equation (20) allows us to write the generalized entropy $Ω_{S} (p)$ as

Ω_{S} (p) = - \sum_{i = 1}^{N} ζ_{g_{i}} p_{i} \log p_{i} - \sum_{i = 1}^{N} (1 - ζ_{g_{i}}) p_{i} \log (\sum_{j \in g_{i}} p_{j}) .

(22)

The first term in Equation (22) captures the Shannon entropy within nests, whereas the second term captures the information between nests. According to this, we may interpret Equation (22) as an augmented version of the Shannon entropy. It is also apparent from (22) that $Ω_{S} (p)$ is not invariant to reordering of the choice probabilities, due to the second term.

Second, when the nesting parameters $ζ_{g_{j}} = 1$ , then $S$ is the identity ( $S_{j} (p) = p_{j}$ for all j), corresponding to the Shannon entropy. When $ζ_{g_{j}} < 1$ , then $S_{j} (p) \geq p_{j}$ ; here, $S (p)$ behaves as a probability weighting function that tends to overweight options j belonging to larger nests. At the extreme $ζ_{g_{j}} \to 0$ , all options within the same nest effectively collapse into one aggregate option and become perfect substitutes.

We denote this model as RI‐nested logit (hereafter RI‐NL). Using this model, we consider two examples, emphasizing both differences and similarities of the RI‐NL vis‐a‐vis the RI‐logit model.

5.1. Example 1: Mango–Pineapple–Cheesecake Continued

We return to the earlier pineapple–mango–cheesecake example from Section 1. For these three products, we consider a model with two nests, in which the tropical fruits pineapple (good 1) and mango (good 2) are placed in one nest g ₁, whereas cheesecake (good 3) is placed by itself in a second nest g ₂. For the nesting parameters, we choose $ζ_{g_{1}} = 0.5$ . The value of $ζ_{g_{2}}$ is irrelevant since nest g ₂ comprises just one alternative. Recall that there are four equally likely possible states:

(v^{1}, v^{2}, v^{3}, v^{4}) = (\begin{matrix} 0 & 0.1 & 0 & 0 \\ 0 & 0 & 0.1 & 0 \\ 0 & 0 & 0 & 0.1 \end{matrix}) .

(23)

Solving this RI‐NL model leads to $\log S (p^{0})$ that is a constant vector plus ${(0, 0, - 1.18)}^{⊤}$ . Hence, the nested logit model with payoffs shifted by this vector produces the same choice probabilities as the RI‐NL. The RI‐NL conditional choice probabilities,

\begin{matrix} (\begin{matrix} 0.41 & 0.46 & 0.37 & 0.40 \\ 0.41 & 0.37 & 0.46 & 0.40 \\ 0.18 & 0.17 & 0.17 & 0.19 \end{matrix}), \end{matrix}

do not satisfy IIA and are hence not compatible with the RI‐logit model.¹⁷

Conversely, we can start with the nested logit model with the payoffs given in (23) above and the same nest parameters as before. The conditional choice probability vectors for this model are

\begin{matrix} (p^{1}, p^{2}, p^{3}, p^{4}) = (\begin{matrix} 0.29 & 0.33 & 0.27 & 0.28 \\ 0.29 & 0.27 & 0.33 & 0.28 \\ 0.41 & 0.40 & 0.40 & 0.44 \end{matrix}), \end{matrix}

and the unconditional choice probability vector is $p^{0} = {(0.29, 0.29, 0.41)}^{⊤}$ under the uniform prior. The corresponding location shift vector $\log S (p^{0})$ is ${(0, 0, - 0.001)}^{⊤}$ up to a constant and shifting the payoffs by this amount produces RI‐NL conditional choice probabilities that are equal to the nested logit choice probabilities.

For comparison, we also compute the case where the nesting parameter has been set to $ζ_{g_{1}} = 0.4$ , which makes the alternatives pineapple and mango closer substitutes than before. The RI‐NL unconditional choice probability vector becomes $p^{0} = {(0.45, 0.45, 0.10)}^{⊤}$ and the RI‐NL conditional choice probabilities become

\begin{matrix} (\begin{matrix} 0.45 & 0.51 & 0.39 & 0.44 \\ 0.45 & 0.39 & 0.51 & 0.44 \\ 0.10 & 0.10 & 0.10 & 0.11 \end{matrix}) . \end{matrix}

The changes across states deviate more from IIA than when $ζ_{g_{1}} = 0.5$ . It appears that, as goods 1 and 2 become more substitutable, the DM is able to make better choices in states where goods 1 or 2 are optimal. This comparative statics suggests that increasing the substitutability between a set of goods corresponds to shifts in the information structure toward signals that allow the DM to better distinguish between states in which these goods are optimal.

5.2. Example 2: Swapping Alternatives Can Lead to Increased Information Cost

As we have stated through the article, our cost functions embody information related to the identity of alternatives. To see how this feature works in practice, consider a nested logit with four choice options, nests formed by options 1–2 and 3–4, and with two states $v^{1} = (\frac{1}{4}, \frac{1}{4}, \frac{1}{4}, \frac{1}{4})$ and $v^{2} = (\frac{1}{8}, \frac{3}{8}, \frac{1}{8}, \frac{3}{8})$ . Define $\log S_{i} (q) = ζ \log q_{i} + (1 - ζ) \log (\sum_{j \in g_{i}} q_{j})$ , where $g_{i}$ is the nest that contains option i. A garbling that swaps alternatives 2 and 3 will move probability mass across nests in state 2 but not in state 1. Then the probability distribution across nests is independent of the state without garbling but not with garbling and therefore this garbling increases the information cost $κ_{S}$ .

6. CONCLUSIONS

The central result in this article is the equivalence between an additive random utility discrete choice model and a corresponding RI‐ARUM. Thus, any additive random utility discrete choice model can be cast as a model of rationally inattentive behavior, and vice versa for any RI‐ARUM; we can go back and forth between the two paradigms.¹⁸ Then, to apply an ARUM, it is no longer necessary to assume that DMs are completely aware of the valuations of all the available options. This is important, as it is clearly unrealistic to expect DMs to be aware of all options when the number is large.

Our equivalence result is at the individual level; hence, it also holds for ARUM with random parameters, including the mixed logit or random coefficient logit models that have been popular in applied work.¹⁹

Our equivalence result generalizes to perturbed random utility models (e.g., Hofbauer and Sandholm, 2002; Fudenberg et al., 2015) where the information cost is the Bregman information associated with a (negative) generalized entropy (Fosgerau et al., 2019). We are also exploring connections between our results and those in the decision theory literature. Gul et al. (2014), for instance, show an equivalence between random utility and an “attribute rule” model of stochastic choice, and we conjecture that our results may be useful in showing similar results for other decision‐theoretic models.

Finally, there are RI models outside the RI‐ARUM framework; that is, RI models with information costs outside the class of generalized entropies introduced in this article.²⁰ Obviously, choice probabilities from these non‐RI‐ARUM models would not be equivalent to those that can be generated from ARUM models; it will be interesting to examine the empirical distinctions that non‐RI‐ARUM choice probabilities would have.

The properties in Proposition A1 are satisfied by any inverse scaled demand corresponding to an ARUM but do not characterize ARUM. In fact, the properties may be used to define a class of generalized entropies that is strictly larger than the class consisting of those generalized entropies corresponding to ARUM (see Fosgerau et al., 2019). We have not found direct conditions that characterize those generalized entropies that correspond to ARUM. We have chosen in this article to work with the generalized entropies that derive from ARUM to emphasize the main point of the article: the connection between RI with our information cost and ARUM. However, the conclusions of Proposition 6 extend without change to the case when $S$ is an inverse scaled demand that satisfies the conclusions of Proposition 8 but not necessarily corresponds to an ARUM. For applications, it is then possible to work with such generalized entropies without needing to check that they correspond to an ARUM.

A.1. Proofs of Results in Main Text

Proof of Proposition 1

Note first that we may write

$\begin{matrix} T (e^{v}) = e^{W (v)} q (v) . \end{matrix}$

The probabilities in $q$ are never 0 since the random utility shocks have full support. Define for convenience $X = {v \in R^{N} | v_{1} = 0}$ . The results in Norets and Takahashi (2013) apply to the mapping $q$ : Hence, $q$ is a bijection between X and the interior of the unit simplex Δ.

To obtain injectivity of $T$ on $R_{+}^{N}$ , suppose that $T (e^{v}) = T (e^{v^{'}})$ and aim to show that $v = v^{'}$ . Since $T_{i} (e^{v}) = e^{W (v)} q_{i} (v)$ and $\sum_{i = 1}^{N} q_{i} = 1$ , we may sum $\sum_{i = 1}^{N} T_{i} (e^{v}) = \sum_{i = 1}^{N} T_{i} (e^{v^{'}})$ to find that $W (v) = W (v^{'})$ and hence that $q (v) = q (v^{'})$ . Then by the Norets and Takahashi (2013) result, $v = v^{'} + (c, \dots, c)$ that leads to $W (v) = W (v^{'}) + c = W (v) + c$ , and hence, $c = 0$ .

Consider next surjectivity and let $x \in R_{+}^{N}$ be an arbitrary point. We aim to solve the equation $T (y) = x$ . By Norets and Takahashi, there exists $v \in X$ such that $q (v) = x / \sum_{i = 1}^{N} x_{i}$ . Let $c = - W (v) + \log \sum_{i = 1}^{N} x_{i}$ . Then

$\begin{matrix} T (e^{v + c}) = e^{W (v + c)} q (v) = q (v) \sum_{i = 1}^{N} x_{i} = x, \end{matrix}$

which establishes that $T$ is a surjection from $R_{+}^{N}$ to $R_{+}^{N}$ .

The next point is to extend $T$ to $R_{+ 0}^{N}$ . For $y$ on the boundary of $R_{+ 0}^{N}$ , let $z = {i \in {1, \dots, N} | y_{i} > 0}$ index the nonzero components of $y$ . If $z = \emptyset$ , then we let $T (y) = (0, \dots, 0)$ . For $z \neq \emptyset$ , consider the discrete choice model (7) with choice restricted to z. Let ${\tilde{p}}_{i}, i \in z (y)$ be the choice probabilities from this restricted model and let ${\tilde{p}}_{i} = 0$ for $i \notin z$ . Similarly, let $\tilde{W}$ be the expected maximum utility for the restricted model. Define then $T (y) = e^{\tilde{W}} ({\tilde{p}}_{1}, \dots, {\tilde{p}}_{N})$ .

The argument that $T$ is a bijection from $R_{+}^{N}$ to $R_{+}^{N}$ may be repeated for each combination of zeros reflected in the set z. Hence, the extended function is a bijection from $R_{+ 0}^{N}$ to $R_{+ 0}^{N}$ .

It remains to show that $T$ is continuous. We will do this by establishing that the values of $T$ on the boundary of $R_{+ 0}^{N}$ are limits of values from sequences in the interior. A limit point of a continuous function is unique; hence, for each boundary point, we need just consider one sequence converging to that point.

Consider first a sequence ${y^{n}}_{n = 1}^{\infty}$ with $\lim_{n \to \infty} y^{n} = (0, \dots 0)$ . As the limit is unique if it exists, consider $y^{n} = y / n$ for some $y \in R_{+}^{N}$ . Note that $W (\log y^{n}) = W (\log y) - \log n \to - \infty$ . Then since $q_{i} (y^{n})$ are bounded between 0 and 1, $T (y^{n}) \to (0, . ., 0)$ as required.

Consider then $y \in R_{+}^{N}$ , let $z \subset {1, \dots, N}$ be nonempty and define $y_{i}^{n} = y_{i}$ for $i \in z$ and $y_{i}^{n} = y_{i} / n$ for $i \notin z$ . Let F be the cumulative distribution function of the vector of random utility shocks and let $F_{i}$ be its partial derivatives. Then choice probabilities may be written as

$q_{i} (v) = \int_{- \infty}^{\infty} F_{i} (u + v_{i} - v_{1}, \dots, u + v_{i} - v_{N}) d u .$ (A.1)

As above, let $\tilde{q}$ be the choice probabilities when choice is restricted to z. At no loss of generality, let $z = {1, \dots, \tilde{N}}$ , where $0 < \tilde{N} < N$ . For $i \in z$ , use the dominated convergence theorem together with (A.1) to see that

$\begin{matrix} \lim_{n \to \infty} q_{i} (\log y^{n}) & = & \int_{- \infty}^{\infty} \lim_{n \to \infty} F_{i} (u + \log y_{i}^{n} - \log y_{1}^{n}, \dots, u + \log y_{i}^{n} - \log y_{N}^{n}) d u \\ = & \int_{- \infty}^{\infty} F_{i} (u + \log y_{i} - \log y_{1}, \dots, u + \log y_{i} - \log y_{\tilde{N}}, \infty \dots, \infty) d u \\ = & {\tilde{q}}_{i} . \end{matrix}$

These probabilities sum to 1. Hence, $\lim_{n \to \infty} q_{i} (\log y^{n}) = 0$ for $i \notin z$ .

By dominated convergence,

$\begin{matrix} \lim_{n \to \infty} W (y^{n}) & = & \lim_{n \to \infty} (\int_{0}^{\infty} (1 - F (u - \log y^{n})) d u - \int_{- \infty}^{0} F (u - \log y^{n}) d u) \\ = & \int_{0}^{\infty} (1 - \lim_{n \to \infty} F (u - \log y^{n})) d u - \int_{- \infty}^{0} \lim_{n \to \infty} F (u - \log y^{n}) d u \\ = & \int_{0}^{\infty} (1 - F (u - \log y_{1}, \dots, u - \log y_{\tilde{N}}, \infty, \dots, \infty)) d u \\ - \int_{- \infty}^{0} F (u - \log y_{1}, \dots, u - \log y_{\tilde{N}}, \infty, \dots, \infty) d u \\ = & \tilde{W} . \end{matrix}$

Combining these results, find that $T (\lim_{n \to \infty} y^{n}) = \lim_{n \to \infty} T (y^{n})$ as required.

Finally, defining $S (\cdot) = T^{- 1} (\cdot)$ the conclusion follows at once. $□$

Proof of Proposition 2

We first evaluate $W^{:} (q)$ . If $1 \cdot q \neq 1$ , then

$\begin{matrix} q \cdot (v + γ) - W (v + γ) = q \cdot v - W (v) + (1 \cdot q - 1) γ, \end{matrix}$

which can be made arbitrarily large by changing γ and hence $W^{:} (q) = \infty$ . Next, consider $q$ with some $q_{j} < 0$ . $W (v)$ decreases toward a lower bound as $v_{j} \to - \infty$ . Then $q \cdot v - W (v)$ increases toward $+ \infty$ , and hence, $W^{:}$ is $+ \infty$ outside the unit simplex Δ.

For $q \in Δ$ , we solve the maximization problem

$W^{:} (q) = sup_{v} {q \cdot v - W (v)} .$ (A.2)

Note that for any constant k, we have $W (v + k \cdot 1) = k + W (v)$ , so that we may normalize $1 \cdot v = 0$ . Maximize then the Lagrangian $q \cdot v - W (v) - λ (1 \cdot v)$ with first‐order conditions $0 = q_{j} - \frac{\partial W (v)}{\partial v_{j}} - λ$ , which lead to $λ = 0$ . Then

$\begin{matrix} q & = & \nabla_{v} W (v) \Leftrightarrow \\ q e^{W (v)} & = & \nabla_{v} (e^{W (v)}) = T (e^{v}) \Leftrightarrow \\ S (q) e^{W (v)} & = & e^{v} \Leftrightarrow \\ \log S (q) + W (v) & = & v \Rightarrow \\ q \cdot \log S (q) + W (v) & = & q \cdot v . \end{matrix}$

Inserting this into (A.2) leads to the desired result.

For part (i), let $q$ be a solution to problem (11). Then, by the homogeneity of $T$ , we have $q = \frac{1}{α} T (e^{v})$ , where $α = \sum_{j = 1}^{N} T_{j} (e^{v})$ . Then, by the definition of $S$ , it follows that $S (q) = \frac{e^{v}}{α}$ . Replacing the latter expression in Equation (11), we get

$\begin{matrix} W (v) & = & q v - q \log (e^{v} / α), \\ = & q v - q (\log e^{v} + \log α), \\ = & \log (\sum_{j = 1}^{N} T_{j} (e^{v})) . \end{matrix}$

$□$

Proof of Proposition 3

Let $Ω (p) = - p \cdot \log S (p)$ be a generalized entropy. Then, using Proposition A1, the associated Bregman divergence becomes

$\begin{matrix} D (p | | q) & = & - q \cdot \log S (q) + p \cdot \log S (p) - (\log S (q) + 1) \cdot (p - q) \\ = & p \cdot (\log S (p) - \log S (q)), \end{matrix}$

where we have used that $1 \cdot (p - q) = 0$ .

Convexity of $D (p | | q)$ in $p$ follows from Proposition A1 or from the fact that it is a Bregman divergence. Clearly, $D (q | | q) = 0$ .

Our information cost is by definition an expected Bregman divergence. We therefore immediately obtain that it is convex in $p (\cdot)$ holding $p^{0}$ constant and that $κ_{S} (p (\cdot), μ) = 0$ if action and state are independent, since in that case, $p (V) = p^{0}$ . $□$

Proof of Proposition 4

The Lagrangian for the DM's problem is

$\begin{matrix} Λ = E (V \cdot A) - κ_{S} (p, μ) + E (γ (V) (1 - \sum_{j} p_{j} (V))) + E (\sum_{j} ξ_{j} (V) p_{j} (V)), \end{matrix}$

where $γ (V)$ and $ξ_{j} (V)$ are Lagrange multipliers corresponding to condition (4).

Before we derive the first‐order conditions for $p_{j} (v)$ , it is useful to note that we may regard the terms $\log S (p^{0})$ and $\log S (p (v))$ in the information cost $κ_{S} (p, μ)$ as a constant, since their derivatives cancel out by Proposition A1(iii). Define ${\tilde{v}}_{j} = v_{j} + ξ_{j} (v) + \log S_{j} (p^{0})$ and $\tilde{v} = ({\tilde{v}}_{1}, \dots, {\tilde{v}}_{N})$ . Then the first‐order condition for $p_{j} (v)$ is easily found to be

$\log S_{j} (p (v)) = {\tilde{v}}_{j} - γ (v) .$ (A.3)

This fixes $p (v)$ as a function of $p^{0}$ since then

$p (v) = T (e^{\tilde{v}}) \exp (- γ (v)) .$ (A.4)

If some $p_{j} (v) = 0$ , then we must have ${\tilde{v}}_{j} = - \infty$ , which implies that $S_{j} (p^{0}) = 0$ and the value of $ξ_{j} (v)$ is irrelevant. If $p_{j} (v) > 0$ , then $ξ_{j} (v) = 0$ . We may then simplify by setting $ξ_{j} (v) = 0$ for all $j, v$ at no loss of generality, which means that ${\tilde{v}}_{j} = v_{j} + \log S_{j} (p^{0})$ .

Using that probabilities sum to 1 leads to

$\begin{matrix} \exp (γ (v)) = \sum_{j} T_{j} (e^{\tilde{v}}), \end{matrix}$

and hence, (i) follows. Item (ii) then follows immediately.

Now substitute (17) back into the objective, using $p_{j} (v) ξ_{j} (v) = 0$ , to find that it reduces to

$Λ = E γ (V) = E \log \sum_{j} T_{j} (e^{V + \log S (p^{0})})$ (A.5)

We may then use (A.5) to determine $p^{0}$ . Now apply Equation (12) to establish part (iii) of the proposition. $□$

Proof of Proposition 5

Consider an RI‐ARUM model with prior $(μ, V)$ , scaled demand $T$ and choice probabilities $p (v)$ and let $C$ be its consideration set. For $i \notin C$ , we have $p_{i} (v) = 0$ and $\log S_{i} (p^{0}) = - \infty$ by Proposition 1. Let $c = \log S (p^{0})$ . Then for $i \in C$ , we have

$\begin{matrix} p_{i} (v) & = & \frac{T_{i} (e^{v + \log S (p^{0})})}{\sum_{j = 1}^{N} T_{j} (e^{v + \log S (p^{0})})} \\ = & P (v_{i} + c_{i} + ε_{i} = \max_{j} \{v_{j} + c_{j} + ε_{j}\}) \\ = & P (v_{i} + c_{i} + ε_{i} = \max_{j \in C} \{v_{j} + c_{j} + ε_{j}\}), \end{matrix}$

which is an ARUM on $C$ .

To prove the converse, let $q^{0} = E q (v)$ , $c = - \log S (q_{0})$ and consider the RI‐ARUM with prior $(v \to μ (v + \log S (q^{0})), V - \log S (q^{0}))$ and scaled demand $T$ . The RI‐ARUM conditional choice probabilities satisfy the first‐order condition

$\begin{matrix} p_{i} (v - \log S (q_{0})) = \frac{T_{i} (e^{v - \log S (q^{0}) + \log S (p^{0})})}{\sum_{j = 1}^{N} T_{j} (e^{v - \log S (q^{0}) + \log S (p^{0})})} \end{matrix}$

with $p^{0} (v - \log S (q_{0})) = E p (v - \log S (q_{0}))$ . By strict convexity of the information cost, the first‐order condition uniquely identifies the optimal RI‐ARUM conditional choice probabilities. Then $p (v - \log S (q_{0})) = q (v)$ solves the RI‐ARUM maximization problem. $□$

A.2. Additional Results

Proposition A1

(Properties of the inverse scaled demand function) For any ARUM discrete choice model satisfying Assumption 1, the corresponding inverse scaled demand $S (\cdot)$ satisfies:

(i)
$S$ is continuous and homogenous of degree 1.

(ii)
$q \cdot \log S (q)$ is convex and strictly convex on the interior of Δ.

(iii)
$S$ is differentiable with :
$\begin{matrix} \sum_{i = 1}^{N} q_{i} \frac{\partial \log S_{i} (q)}{\partial q_{k}} = 1, k \in {1, \dots, N}, \end{matrix}$
where $q$ is a probability vector with $0 < q_{i} < 1$ for all i.

Proof of Proposition A1

Continuity of $S$ follows from continuity of the partial derivatives of W, which is immediate from the definition. Homogeneity of $S$ is equivalent to homogeneity of $T$ . Using the homogeneity property of W

$\begin{matrix} S^{- 1} (λ e^{v}) = \nabla_{v} (e^{W (v + \log λ)}) = λ \nabla_{v} (e^{W (v)}) = λ S^{- 1} (e^{v}), \end{matrix}$

which shows that $T$ , and hence, $S$ are homogenous of degree 1.

The requirement that $\sum_{i = 1}^{N} q_{i} \frac{\partial \log S_{i} (q)}{\partial q_{k}} = 1$ in the relative interior of the unit simplex Δ may be expressed in matrix notation as

$\begin{matrix} (q_{1}, \dots, q_{N}) \cdot J_{\log S} (q) = (1, \dots, 1), \end{matrix}$

where

$\begin{matrix} J_{\log S} (q) = {\{\frac{\partial \log S_{i} (q)}{\partial q_{j}}\}}_{i, j = 1}^{N} \end{matrix}$

is the Jacobian of $\log S (q)$ .

Defining $\hat{t} \equiv \log S (q)$ , we have $q = T (e^{\hat{t}})$ and hence $W (e^{\hat{t}}) = \log (1 \cdot T (e^{\hat{t}})) = \log 1 = 0$ by Proposition 2. Noting that ${(\log (S))}^{- 1} (\hat{t}) = T (e^{\hat{t}})$ the requirement in part (ii) is equivalent to

$\begin{matrix} (q_{1}, \dots, q_{N}) = (q_{1}, \dots, q_{N}) \cdot J_{\log S} (q) \cdot J_{{(\log S)}^{- 1}} (\hat{t}) = (1, \dots, 1) \cdot J_{T (e^{\hat{t}})} (\hat{t}) . \end{matrix}$

Now, use the Williams–Daly–Zachary theorem to find that

$\begin{matrix} (1, \dots, 1) \cdot J_{T (e^{\hat{t}})} (\hat{t}) = \nabla_{\hat{t}} (e^{W (\hat{t})}) = e^{W (\tilde{v})} (q_{1}, \dots q_{N}) = (q_{1}, \dots q_{N}) . \end{matrix}$

as required.

Convexity of $q \cdot \log S (q)$ follows from Proposition 2(ii). To show that $q \cdot \log S (q)$ is strictly convex for $q \in i n t Δ$ , note first that $e^{W (v)}$ has positive definite Hessian on the set ${v \in R^{N} : 1 \cdot v = 1}$ (Hofbauer and Sandholm, 2002). This Hessian is equal to the Jacobian of $T (e^{v}) = e^{W (v)} q (v)$ , which is then positive definite. The inverse of $T (e^{v})$ is $\log S (q)$ , which then also has a positive definite Jacobian. But the Hessian of $q \cdot \log S (q)$ is $\log S (q) + 1$ . $□$

Corollary A1

For some option j, and for all $v \in V$ , let $v_{j} \leq v_{i}$ for all $i \neq j$ , and assume that the inequality is strict with positive probability. Then $p_{j}^{0} = 0$ (i.e., option j is not in the consideration set).

Proof of Corollary A1

Let ○ denote the Hadamard product, that is, $(a_{1}, \dots, a_{N}) \circ (b_{1}, \dots, b_{N}) = (a_{1} b_{1}, \dots, a_{N} b_{N})$ . Assume, toward a contradiction, that $p_{j}^{0} > 0$ . It follows from cyclic monotonicity (Rockafellar, 1970, Thm. 23.5) that $p_{j} (v)$ increases as the utility of other options $i, i \notin j$ decreases. Then,

$\begin{matrix} p_{j}^{0} & = & E (\frac{T_{j} (e^{V} \circ S (p^{0}))}{\sum_{k} T_{k} (e^{V} \circ S (p^{0}))}) \end{matrix}$ (A.6)

$\begin{matrix} < & E (\frac{T_{j} (e^{V_{j}} S (p^{0}))}{\sum_{k} T_{k} (e^{V_{j}} S (p^{0}))}) \end{matrix}$ (A.7)

$\begin{matrix} = & E (\frac{e^{V_{j}} T_{j} (S (p^{0}))}{e^{V_{j}} \sum_{k} T_{k} (S (p^{0}))}) = E (\frac{p_{j}^{0}}{\sum_{k} p_{k}^{0}}) = p_{j}^{0} . \end{matrix}$ (A.8)

This is a contradiction as desired. $□$

Proposition A2

(Convexity of the Bregman divergence) The Bregman information $κ_{S} (p (\cdot), μ) = Ω_{S} (p^{0}) - E Ω_{S} (p (V))$ associated with a nested logit model is convex. It is strictly convex when the conditional probabilities differ from the unconditional probability p ⁰.

Proof of Proposition A2

When Ω is the Shannon entropy, that is, when the associated ARUM is a multinomial logit model, then the Bregman information is the mutual (Shannon) information. By Cover and Thomas (2006, Thm 2.7.4), the mutual (Shannon) information is convex as a function of the conditional probability $p (\cdot)$ , and strictly convex when the conditional probabilities differ from the unconditional probability p ⁰.

Consider now a nested logit model and note that the corresponding generalized entropy may be written $Ω_{S} (p) = Ω (Γ p) + c^{⊤} p$ , where Γ is a matrix whose columns are linearly independent probability vectors. For example, the inverse scaled demand (20) of the two‐level nested logit model may be written as

$\begin{matrix} \log S_{i} (q) & = & ζ_{g_{i}} \log ζ_{g_{i}} q_{i} + (1 - ζ_{g_{i}}) \log (\sum_{j \in g_{i}} (1 - ζ_{g_{j}}) q_{j}) \\ - (1 - ζ_{g_{i}}) \log (1 - ζ_{g_{i}}) - ζ_{g_{i}} \log ζ_{g_{i}} . \end{matrix}$

Then, the NL generalized entropy is the composition of a linear function and a concave function, plus a linear function. The Bregman information may then be written

$\begin{matrix} κ_{S} (p (\cdot), μ) = Ω (Γ p^{0}) - E Ω (Γ p (V)), \end{matrix}$

which is a composition of a linear function and the convex mutual Shannon information, whereas the linear terms cancel out. Hence, it is convex and strictly convex when the conditional probabilities differ from the unconditional probability p ⁰. $□$

Footnotes

In the context of the RI model, we interpret IIA as a comparison across states, holding the decision maker's (DM) prior fixed. For further details about IIA, we refer the reader to Maddala (1986, 3.2) and Anderson et al. (1992).

The choice probabilities here are generated, not by an RI‐logit model, but by an RI‐nested logit model, introduced in this article.

⁴

This example considers how choices vary across different states of the world. This is distinct from the exercises in Matějka and McKay (2015), who consider how choices vary as priors or choice sets change. For empirical analysis, the change in choices across states is often most relevant: for instance, in demand analysis, researchers wish to uncover how consumer choices depend on changes in product prices, which can be considered as a change from one state to another.

⁵

More formally, Matějka and McKay (2015) study the problem where agents first choose an information structure (mapping from state of the world to information signals). and then based on signals, choose optimal actions.

⁶

Our presentation of the RI paradigm here follows Sims (2003, 2010), in which agents are modeled as choosing directly their conditional choice probabilities ${p (v)}_{v \in V}$ , taking the prior distribution $μ (v)$ as given.

⁷

The convexity of $W (\cdot)$ follows from the convexity of the max function. Differentiability follows from the absolute continuity of $ε$ . See Shi et al. (2018), Chiong and Shum (2019), and Melo et al. (2019) for semiparametric econometric approaches based on these convex‐analytic properties of discrete choice models.

⁸

By direct differentiation of $e^{W (v)}$ , and applying the Williams–Daly–Zachary theorem, we have $q_{i} (v) = T_{i} (e^{v}) / e^{W (v)}$ for all i. Imposing $\sum_{i} q_{i} (v) = 1$ , we have $\sum_{i} T_{i} (e^{v}) = e^{W (v)}$ .

⁹

For details, see Rockafellar (1970, ch. 12). Briefly, for a convex function $g (x)$ , its convex conjugate function is defined as $g^{:} (y) = \max_{x} {x \cdot y - g (x)}$ , which is also convex. Fenchel's theorem then establishes that $g (x) = \max_{y} x \cdot y - g^{:} (y)$ . When $x$ and $y$ are scalar and $g (x)$ is differentiable, then $g (x)$ and $g^{:} (y)$ are inverse mappings to each other. Vohra (2011) applies these ideas to the mechanism design setting.

¹⁰

To the best of our knowledge, this result is new in the literature on random utility models, and may be of independent interest. In particular, this result is related to the literature on perturbed random utility models, which has been focused on characterizing choice probabilities as the solution of a deterministic optimization problem (Hofbauer and Sandholm, 2002; Fosgerau and McFadden, 2012; Fudenberg et al., 2015).

¹¹

See Chiong et al. (2016).

¹²

We have $\log S (q) = \log S (T (e^{v}) / (\sum_{j} T_{j} (e^{v}))) = \log (e^{v} / (\sum_{j} T_{j} (e^{v}))) = v - W (v)$ . where we have used the fact that $S$ is homogeneous of degree 1.

¹³

Strict concavity of $Ω_{S}$ is established in Proposition A1 in the Appendix.

¹⁴

Using Bayes' rule, such a shift in choice probabilities corresponds to a change in beliefs from the prior μ to a posterior $μ (v | i) \propto p_{i} (v) μ (v)$ , which Caplin and Dean (2015) and Chambers et al. (2018) refer to as “revealed posterior” distributions.

¹⁵

Indeed, in empirical papers utilizing discrete‐choice demand models, complementarities between choices can be typically accommodated only by modeling consumers as choosing “bundles” of options in the choice set; see, or example, Gentzkow (2007) and Fox and Lazzati (2017). Such an approach may become intractable as the dimensionality of the choice set increases. In contrast, complementarities can arise in the RI model both from correlation in the priors (as pointed out by Matějka and McKay, 2015), and also from the form of the generalized entropy information cost functions considered in this article.

¹⁶

To be consistent with the definition of IIA, when applied to RI models, we assume that the DM's prior is fixed. This assumption allows us to keep the choice set constant so that we can focus on changes in the utilities associated to alternatives. For further details about IIA, see Maddala (1986, Chap. 2) and Anderson et al. (1992).

¹⁷

But even with the logit specification, the IIA property can break down if the DM is able to consume more than one product, that is,. bundles of goods (cf. Gentzkow, 2007).

¹⁸

In a similar vein, Webb (2019) demonstrates an equivalence between random utility models and bounded‐accumulation or drift‐diffusion models of choice and reaction times used in the neuroeconomics and psychology literature.

¹⁹

See, for instance, Berry et al. (1995), McFadden and Train (2000), and Fox et al. (2012).

²⁰

As an example, the function $g (p) = - \sum_{i = 1}^{N} \log (p_{i})$ is not a generalized entropy function; thus, an RI model using this as an information cost function would lie outside the RI‐ARUM framework.

REFERENCES

Allen, R. , and Rehbeck J., “Identification with Additively Separable Heterogeneity,” Econometrica 87 (2019), 1021–54. [Google Scholar]
Anderson, S. P. , De Palma A., and Thisse J.‐F., “A Representative Consumer Theory of the Logit Model,” International Economic Review 29 (1988), 461–66. [Google Scholar]
Anderson, S. P. , De Palma A., and Thisse J.‐F., Discrete Choice Theory of Product Differentiation (Cambridge, MA: MIT Press, 1992). [Google Scholar]
Arcidiacono, P. , and Miller R. A., “Conditional Choice Probability Estimation of Dynamic Discrete Choice Models with Unobserved Heterogeneity,” Econometrica 79 (2011), 1823–67. [Google Scholar]
Banerjee, A. , Merugu S., Dhillon I. S., and Ghosh J., “Clustering with Bregman Divergences,” Journal of Machine Learning Research 6 (2005), 1705–49. [Google Scholar]
Berry, S. , “Estimating Discrete‐Choice Models of Market Equilibrium,” RAND Journal of Economics 25 (1994), 242–62. [Google Scholar]
Berry, S. T. , Levinsohn J., and Pakes A., “Automobile Prices in Market Equilibrium,” Econometrica 63 (1995), 841–90. [Google Scholar]
Bregman, L. , “The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming,” USSR Computational Mathematics and Mathematical Physics 7 (1967), 200–17. [Google Scholar]
Brown, Z. Y. , and Jeon J., “Endogenous Information Acquisition and Insurance Choice,” Working paper, Michigan, University of Michigan, 2019.
Caplin, A. , and Dean M., “Revealed Preference, Rational Inattention, and Costly Information Acquisition,” American Economic Review 105 (2015), 2183–203. [Google Scholar]
Caplin, A. , Dean M., and Leahy J., “Rationally Inattentive Behavior: Characterizing and Generalizing Shannon Entropy,” Technical Report, National Bureau of Economic Research, Cambridge, MA, 2017.
Caplin, A. , Dean M., and Leahy J., “Rational Inattention, Optimal Consideration Sets, and Stochastic Choice,” Review of Economic Studies 86 (2019), 1061–94. [Google Scholar]
Caplin, A. , Leahy J., and Matějka F., “Rational Inattention and Inference from Market Share Data,” CERGE‐EI Working Paper, 2016.
Chambers, C. P. , Liu C., and Rehbeck J., “Costly Information Acquisition,” UCSD, 2018.
Chiong, K. X. , Galichon A., and Shum M., “Duality in Dynamic Discrete‐Choice Models,” Quantitative Economics 7 (2016), 83–115. [Google Scholar]
Chiong, K. X. , and Shum M., “Random Projection Estimation of Discrete‐Choice Models with Large Choice Sets,” Management Science 65 (2019), 256–71. [Google Scholar]
Cover, T. M. , and Thomas J. A., Elements of Information Theory, 2nd edition (Somerset: Wiley‐Interscience, 2006). [Google Scholar]
Fosgerau, M. , Monardo J., and de Palma A., “The Inverse Product Differentiation Logit Model,” July 13, 2019. Available at SSRN: https://ssrn.com/abstract=3141041 or 10.2139/ssrn.3141041. [DOI]
Fosgerau, M. , and McFadden D. L., “A Theory of the Perturbed Consumer with General Budgets,” NBER Working Paper, 2012.
Fosgerau, M. , and McFadden D., and Bierlaire M., “Choice Probability Generating Functions,” Journal of Choice Modelling 8 (2013), 1–18. 10.1016/j.jocm.2013.05.002. [DOI] [Google Scholar]
Fox, J. T. , Kim K. I., Ryan S. P., and Bajari P., “The Random Coefficients Logit Model Is Identified,” Journal of Econometrics 166 (2012), 204–12. [Google Scholar]
Fox, J. T. , and Lazzati N., “A Note on Identification of Discrete Choice Models for Bundles and Binary Games,” Quantitative Economics 8 (2017), 1021–36. [Google Scholar]
Frankel, A. , and Kamenica E., “Quantifying Information and Uncertainty,” American Economic Review 109 (2019), 3650–80. [Google Scholar]
Fudenberg, D. , Iijima R., and Strzalecki T., “Stochastic Choice and Revealed Perturbed Utility,” Econometrica 83 (2015), 2371–2409. [Google Scholar]
Gentzkow, M. , “Valuing New Goods in a Model with Complementarity: Online Newspapers,” American Economic Review 97 (2007), 713–44. [Google Scholar]
Gul, F. , Natenzon P., and Pesendorfer W., “Random Choice as Behavioral Optimization,” Econometrica 82 (2014), 1873–1912. [Google Scholar]
Hébert, B. , and Woodford M., “Rational Inattention with Sequential Information Sampling,” NBER Working Paper, 2017.
Hobson, A. , “A New Theorem of Information Theory,” Journal of Statistical Physics 1 (1969), 383–91. [Google Scholar]
Hofbauer, J. , and Sandholm W. H., “On the Global Convergence of Stochastic Fictitious Play,” Econometrica 70 (2002), 2265–94. [Google Scholar]
Joo, J. , “Rational Inattention as an Empirical Framework – With an Application to the Welfare Effects of New Product Introduction and Endogenous Promotion,” Working Paper, UT Dallas, 2019.
Maddala, G. S. , Limited‐Dependent and Qualitative Variables in Econometrics (Cambridge: Cambridge University Press, 1986). [Google Scholar]
Matějka, F. , and McKay A., “Rational Inattention to Discrete Choices: A New Foundation for the Multinomial Logit Model,” American Economic Review 105 (2015), 272–98. 10.1257/aer.20130047. [DOI] [Google Scholar]
McFadden, D. , “Modelling the Choice of Residential Location,” in Karlquist A., Snickars F., and Weibull J. W., eds., Spatial Interaction Theory and Planning Models, Volume 673 (Amsterdam: North Holland, 1978), 75–96. [Google Scholar]
McFadden, D. , “Econometric Models of Probabilistic Choice,” in Manski C. and McFadden D., eds., Structural Analysis of Discrete Data with Econometric Applications (Cambridge, MA: MIT Press, 1981), 198–72. [Google Scholar]
McFadden, D. , and Train K., “Mixed MNL Models for Discrete Response,” Journal of Applied Econometrics 15 (2000), 447–70. [Google Scholar]
Melo, E. , Pogorelskiy K., and Shum M., “Testing the Quantal Response Hypothesis,” International Economic Review 60 (2019), 53–74. [Google Scholar]
Morris, S. , and Yang M., “Coordination and Continuous Choice,” Working Paper, 2019.
Norets, A. , and Takahashi S., “On the Surjectivity of the Mapping between Utilities and Choice Probabilities,” Quantitative Economics 4 (2013), 149–55. [Google Scholar]
Porcher, C. , “Migration with Costly Information,” Working Paper, Princeton, 2019.
Rockafellar, R. T. , Convex Analysis (Princeton, NJ: Princeton University Press, 1970). [Google Scholar]
Shannon, C. E. , “A Mathematical Theory of Communication,” Bell System Technical Journal 27 (1948), 379–423. [Google Scholar]
Shi, X. , Shum M., and Song W., “Estimating Semi‐Parametric Panel Multinomial Choice Models Using Cyclic Monotonicity,” Econometrica 86 (2018), 737–61. [Google Scholar]
Sims, C. A. , “Implications of Rational Inattention,” Journal of Monetary Economics 50 (2003), 665–90. [Google Scholar]
Sims, C. A. , “Rational Inattention and Monetary Economics,” in Woodford M. and Friedman B. M., eds., Handbook of Monetary Economics, Volume 3, Chapter 4 (Amsterdam: Elsevier, 2010), 155–81. [Google Scholar]
Vohra, R. , Mechanism Design: A Linear Programming Approach (Cambridge: Cambridge University Press, 2011). [Google Scholar]
Webb, R. , “The (Neural) Dynamics of Stochastic Choice,” Management Science 65 (2019), 230–55. [Google Scholar]

[iere12469-bib-0001] Allen, R. , and Rehbeck J., “Identification with Additively Separable Heterogeneity,” Econometrica 87 (2019), 1021–54. [Google Scholar]

[iere12469-bib-0002] Anderson, S. P. , De Palma A., and Thisse J.‐F., “A Representative Consumer Theory of the Logit Model,” International Economic Review 29 (1988), 461–66. [Google Scholar]

[iere12469-bib-0003] Anderson, S. P. , De Palma A., and Thisse J.‐F., Discrete Choice Theory of Product Differentiation (Cambridge, MA: MIT Press, 1992). [Google Scholar]

[iere12469-bib-0004] Arcidiacono, P. , and Miller R. A., “Conditional Choice Probability Estimation of Dynamic Discrete Choice Models with Unobserved Heterogeneity,” Econometrica 79 (2011), 1823–67. [Google Scholar]

[iere12469-bib-0005] Banerjee, A. , Merugu S., Dhillon I. S., and Ghosh J., “Clustering with Bregman Divergences,” Journal of Machine Learning Research 6 (2005), 1705–49. [Google Scholar]

[iere12469-bib-0006] Berry, S. , “Estimating Discrete‐Choice Models of Market Equilibrium,” RAND Journal of Economics 25 (1994), 242–62. [Google Scholar]

[iere12469-bib-0007] Berry, S. T. , Levinsohn J., and Pakes A., “Automobile Prices in Market Equilibrium,” Econometrica 63 (1995), 841–90. [Google Scholar]

[iere12469-bib-0008] Bregman, L. , “The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming,” USSR Computational Mathematics and Mathematical Physics 7 (1967), 200–17. [Google Scholar]

[iere12469-bib-0009] Brown, Z. Y. , and Jeon J., “Endogenous Information Acquisition and Insurance Choice,” Working paper, Michigan, University of Michigan, 2019.

[iere12469-bib-0010] Caplin, A. , and Dean M., “Revealed Preference, Rational Inattention, and Costly Information Acquisition,” American Economic Review 105 (2015), 2183–203. [Google Scholar]

[iere12469-bib-0011] Caplin, A. , Dean M., and Leahy J., “Rationally Inattentive Behavior: Characterizing and Generalizing Shannon Entropy,” Technical Report, National Bureau of Economic Research, Cambridge, MA, 2017.

[iere12469-bib-0012] Caplin, A. , Dean M., and Leahy J., “Rational Inattention, Optimal Consideration Sets, and Stochastic Choice,” Review of Economic Studies 86 (2019), 1061–94. [Google Scholar]

[iere12469-bib-0013] Caplin, A. , Leahy J., and Matějka F., “Rational Inattention and Inference from Market Share Data,” CERGE‐EI Working Paper, 2016.

[iere12469-bib-0014] Chambers, C. P. , Liu C., and Rehbeck J., “Costly Information Acquisition,” UCSD, 2018.

[iere12469-bib-0015] Chiong, K. X. , Galichon A., and Shum M., “Duality in Dynamic Discrete‐Choice Models,” Quantitative Economics 7 (2016), 83–115. [Google Scholar]

[iere12469-bib-0016] Chiong, K. X. , and Shum M., “Random Projection Estimation of Discrete‐Choice Models with Large Choice Sets,” Management Science 65 (2019), 256–71. [Google Scholar]

[iere12469-bib-0017] Cover, T. M. , and Thomas J. A., Elements of Information Theory, 2nd edition (Somerset: Wiley‐Interscience, 2006). [Google Scholar]

[iere12469-bib-0018] Fosgerau, M. , Monardo J., and de Palma A., “The Inverse Product Differentiation Logit Model,” July 13, 2019. Available at SSRN: https://ssrn.com/abstract=3141041 or 10.2139/ssrn.3141041. [DOI]

[iere12469-bib-0019] Fosgerau, M. , and McFadden D. L., “A Theory of the Perturbed Consumer with General Budgets,” NBER Working Paper, 2012.

[iere12469-bib-0020] Fosgerau, M. , and McFadden D., and Bierlaire M., “Choice Probability Generating Functions,” Journal of Choice Modelling 8 (2013), 1–18. 10.1016/j.jocm.2013.05.002. [DOI] [Google Scholar]

[iere12469-bib-0021] Fox, J. T. , Kim K. I., Ryan S. P., and Bajari P., “The Random Coefficients Logit Model Is Identified,” Journal of Econometrics 166 (2012), 204–12. [Google Scholar]

[iere12469-bib-0022] Fox, J. T. , and Lazzati N., “A Note on Identification of Discrete Choice Models for Bundles and Binary Games,” Quantitative Economics 8 (2017), 1021–36. [Google Scholar]

[iere12469-bib-0023] Frankel, A. , and Kamenica E., “Quantifying Information and Uncertainty,” American Economic Review 109 (2019), 3650–80. [Google Scholar]

[iere12469-bib-0024] Fudenberg, D. , Iijima R., and Strzalecki T., “Stochastic Choice and Revealed Perturbed Utility,” Econometrica 83 (2015), 2371–2409. [Google Scholar]

[iere12469-bib-0025] Gentzkow, M. , “Valuing New Goods in a Model with Complementarity: Online Newspapers,” American Economic Review 97 (2007), 713–44. [Google Scholar]

[iere12469-bib-0026] Gul, F. , Natenzon P., and Pesendorfer W., “Random Choice as Behavioral Optimization,” Econometrica 82 (2014), 1873–1912. [Google Scholar]

[iere12469-bib-0027] Hébert, B. , and Woodford M., “Rational Inattention with Sequential Information Sampling,” NBER Working Paper, 2017.

[iere12469-bib-0028] Hobson, A. , “A New Theorem of Information Theory,” Journal of Statistical Physics 1 (1969), 383–91. [Google Scholar]

[iere12469-bib-0029] Hofbauer, J. , and Sandholm W. H., “On the Global Convergence of Stochastic Fictitious Play,” Econometrica 70 (2002), 2265–94. [Google Scholar]

[iere12469-bib-0030] Joo, J. , “Rational Inattention as an Empirical Framework – With an Application to the Welfare Effects of New Product Introduction and Endogenous Promotion,” Working Paper, UT Dallas, 2019.

[iere12469-bib-0031] Maddala, G. S. , Limited‐Dependent and Qualitative Variables in Econometrics (Cambridge: Cambridge University Press, 1986). [Google Scholar]

[iere12469-bib-0032] Matějka, F. , and McKay A., “Rational Inattention to Discrete Choices: A New Foundation for the Multinomial Logit Model,” American Economic Review 105 (2015), 272–98. 10.1257/aer.20130047. [DOI] [Google Scholar]

[iere12469-bib-0033] McFadden, D. , “Modelling the Choice of Residential Location,” in Karlquist A., Snickars F., and Weibull J. W., eds., Spatial Interaction Theory and Planning Models, Volume 673 (Amsterdam: North Holland, 1978), 75–96. [Google Scholar]

[iere12469-bib-0034] McFadden, D. , “Econometric Models of Probabilistic Choice,” in Manski C. and McFadden D., eds., Structural Analysis of Discrete Data with Econometric Applications (Cambridge, MA: MIT Press, 1981), 198–72. [Google Scholar]

[iere12469-bib-0035] McFadden, D. , and Train K., “Mixed MNL Models for Discrete Response,” Journal of Applied Econometrics 15 (2000), 447–70. [Google Scholar]

[iere12469-bib-0036] Melo, E. , Pogorelskiy K., and Shum M., “Testing the Quantal Response Hypothesis,” International Economic Review 60 (2019), 53–74. [Google Scholar]

[iere12469-bib-0037] Morris, S. , and Yang M., “Coordination and Continuous Choice,” Working Paper, 2019.

[iere12469-bib-0038] Norets, A. , and Takahashi S., “On the Surjectivity of the Mapping between Utilities and Choice Probabilities,” Quantitative Economics 4 (2013), 149–55. [Google Scholar]

[iere12469-bib-0039] Porcher, C. , “Migration with Costly Information,” Working Paper, Princeton, 2019.

[iere12469-bib-0040] Rockafellar, R. T. , Convex Analysis (Princeton, NJ: Princeton University Press, 1970). [Google Scholar]

[iere12469-bib-0041] Shannon, C. E. , “A Mathematical Theory of Communication,” Bell System Technical Journal 27 (1948), 379–423. [Google Scholar]

[iere12469-bib-0042] Shi, X. , Shum M., and Song W., “Estimating Semi‐Parametric Panel Multinomial Choice Models Using Cyclic Monotonicity,” Econometrica 86 (2018), 737–61. [Google Scholar]

[iere12469-bib-0043] Sims, C. A. , “Implications of Rational Inattention,” Journal of Monetary Economics 50 (2003), 665–90. [Google Scholar]

[iere12469-bib-0044] Sims, C. A. , “Rational Inattention and Monetary Economics,” in Woodford M. and Friedman B. M., eds., Handbook of Monetary Economics, Volume 3, Chapter 4 (Amsterdam: Elsevier, 2010), 155–81. [Google Scholar]

[iere12469-bib-0045] Vohra, R. , Mechanism Design: A Linear Programming Approach (Cambridge: Cambridge University Press, 2011). [Google Scholar]

[iere12469-bib-0046] Webb, R. , “The (Neural) Dynamics of Stochastic Choice,” Management Science 65 (2019), 230–55. [Google Scholar]

PERMALINK

DISCRETE CHOICE AND RATIONAL INATTENTION: A GENERAL EQUIVALENCE RESULT

Mogens Fosgerau

Emerson Melo

André de Palma

Matthew Shum

Abstract

1. INTRODUCTION

Example 1

1.1. Related Literature

1.2. Layout

1.3. Notation

2. RATIONAL INATTENTION

2.1. The RI‐Logit Model: The Matějka and McKay (2015) Result

3. RANDOM UTILITY MODELS AND GENERALIZED ENTROPY

Assumption 1

Proposition 1

Proposition 2

Remark 1

4. GENERALIZING THE RI‐LOGIT MODEL: RI‐ARUM MODELS WITH BREGMAN INFORMATION COST

Proposition 3

Proposition 4

4.1. Equivalence between Discrete Choice (ARUM) and Rational Inattention

Proposition 5

5. THE RI‐NESTED LOGIT MODEL

5.1. Example 1: Mango–Pineapple–Cheesecake Continued

5.2. Example 2: Swapping Alternatives Can Lead to Increased Information Cost

6. CONCLUSIONS

A.1. Proofs of Results in Main Text

Proof of Proposition 1

Proof of Proposition 2

Proof of Proposition 3

Proof of Proposition 4

Proof of Proposition 5

A.2. Additional Results

Proposition A1

Proof of Proposition A1

Corollary A1

Proof of Corollary A1

Proposition A2

Proof of Proposition A2

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases