Skip to main content
Entropy logoLink to Entropy
. 2024 Mar 6;26(3):233. doi: 10.3390/e26030233

Mechanisms for Robust Local Differential Privacy

Milan Lopuhaä-Zwakenberg 1,*, Jasper Goseling 1,*
Editor: Songze Li1
PMCID: PMC10969564  PMID: 38539745

Abstract

We consider privacy mechanisms for releasing data X=(S,U), where S is sensitive and U is non-sensitive. We introduce the robust local differential privacy (RLDP) framework, which provides strong privacy guarantees, while preserving utility. This is achieved by providing robust privacy: our mechanisms do not only provide privacy with respect to a publicly available estimate of the unknown true distribution, but also with respect to similar distributions. Such robustness mitigates the potential privacy leaks that might arise from the difference between the true distribution and the estimated one. At the same time, we mitigate the utility penalties that come with ordinary differential privacy, which involves making worst-case assumptions and dealing with extreme cases. We achieve robustness in privacy by constructing an uncertainty set based on a Rényi divergence. By analyzing the structure of this set and approximating it with a polytope, we can use robust optimization to find mechanisms with high utility. However, this relies on vertex enumeration and becomes computationally inaccessible for large input spaces. Therefore, we also introduce two low-complexity algorithms that build on existing LDP mechanisms. We evaluate the utility and robustness of the mechanisms using numerical experiments and demonstrate that our mechanisms provide robust privacy, while achieving a utility that is close to optimal.

Keywords: local differential privacy, Rényi divergence, robust optimization

1. Introduction

We consider the setting in which an aggregator collects data from many users with the purpose of, for instance, computing statistics or training a machine learning model. In particular, the data contain sensitive information and users do not trust the aggregator. Therefore, they employ a privacy mechanism that transforms the data before sending it to the aggregator. Users have data X=(S,U) from a finite alphabet X=S×U, where sS is sensitive information and uU is non-sensitive. Data are distributed i.i.d. across users according to the distribution P*. In order to preserve their privacy, users disclose a sanitized version Y of X by using a privacy mechanism Q:XY. The aim is that Y contains as much information about X as possible without leaking too much information about S. The challenge that is addressed in this paper is to develop good privacy mechanisms. This scenario and closely related ones were studied in, for instance [1,2,3,4,5,6,7,8,9,10,11]. In this paper, we use the following version of local differential privacy (LDP), as introduced in [3]:

P(Y=y|S=s)eεP(Y=y|S=s), (1)

for all s,sS and privacy parameter ε>0. In addition, we measure the utility of Y through the mutual information I(X;Y). We discuss differences with related work in Section 2.

Note that if all information is sensitive, i.e., if X=S, (1) reduces to

P(Y=y|X=x)eεP(Y=y|X=x), (2)

which is the traditional LDP constraint [1,2,5]. An important property of (2) is that it does not depend on P*, but only on Q. The independence of P* is a key factor in the success of differential privacy, since it leverages the need to make assumptions about the distribution of the data or on the background/side-knowledge available to the aggregator. As is clear from (1), however, independence from P* no longer holds if not all data are sensitive.

Assuming that P* is known, one can develop good privacy mechanisms for various settings with partially sensitive information [3,6,12]. In practice, however, P* has to be modeled using domain knowledge or estimated from data, leading to errors. The prevalent approach in the literature has been to develop privacy mechanisms based on a (point) estimate P^ and analyze sensitivity with respect to. errors in this estimate. In this work, we follow the approach that was proposed in [13,14], which is to construct a set F of probability distributions that we are confident contains P*. Subsequently, we construct privacy mechanisms that aim to maximize utility, while satisfying (1) for all probability distributions in F. We call the resulting privacy framework robust local differential privacy (RLDP).

In a sense, RLDP is a relaxed form of privacy. Indeed, it may seem appealing, but it is—as we illustrate next—often infeasible to enforce (1) for all possible distributions. To this end, we consider two extreme cases. First, consider a joint distribution of S and U under which S=U. Intuitively, we cannot disclose much information about U, since this is directly leaking information about S. As such, the utility of Y is low. Next, consider a joint distribution under which S and U are independent. Intuitively, we can disclose U without additional precautions, providing a high utility on Y. The point is that we need to design a single privacy mechanism Q that satisfies (1) for all distributions, including the ‘worst case’ in which S=U, leading to low utility Y. In this work, we take the mid-ground between, on the one hand, only using a point estimate P^ and, on the other hand, using all possible distributions. We do so by defining a set of ‘reasonable’ distributions F. In particular, we construct F based on public side-information. This public side information consists of n pairs of data (s1,u1),,(sn,un), which like the data of users are i.i.d. according to unknown distribution P*. Our set F is constructed as a closed ball under a Rényi divergence around the maximum likelihood point estimate P^ of P*. By doing so, we are (statistically) confident that F contains P*, with the radius of the ball controlling the confidence level.

The RLDP framework is an instance of the more general Pufferfish framework [15]. In Section 2, we make this connection explicit and use it to describe the semantic privacy guarantees that are offered by RLDP.

The main contributions of this paper are as follows:

  1. We use a Rényi divergence to construct F and analyze the resulting structure and statistics of F. In particular, we demonstrate that projections of F are again balls under the same divergence. Moreover, we bound the projected sets in terms of an l1 norm.

  2. Using these results we approximate F by an enveloping polytope. We then use techniques from robust optimization [16,17,18] to characterize PolyOpt, the mechanism that is optimal over this polytope.

  3. A drawback of this method is that it relies on vertex enumeration and is, therefore, computationally unfeasible for large alphabets. Therefore, we introduce two low-complexity privacy mechanisms. The first is independent reporting (IR), in which S and U are reported through separate LDP mechanisms.

  4. We characterize the conditions that underlying LDP mechanisms have to satisfy in order for IR to ensure RLDP. Furthermore, while IR can incorporate any LDP mechanism, we show that it is optimal to use randomized response [19]. This drastically reduces the search space and allows us to find the optimal IR mechanism using low-dimensional optimization.

  5. The second low-complexity mechanism that we develop is called secret-randomized response (SRR) and is based on randomized response.

  6. We show that SRR maximizes mutual information in the low-privacy regime for the case that F is the entire probability simplex.

  7. We demonstrate the improved utility of RLDP over LDP with numerical experiments. In particular, we compare the performance of our mechanisms with generalized random response [5]. We provide results for both synthetic data sets and real-world census data.

The structure of this paper is as follows: After discussing related work in Section 2, we describe the model in detail in Section 3. In Section 4, we present results on the structure and statistics of projections of F. These results are used in Section 5 to develop the PolyOpt privacy mechanism. Low-complexity privacy mechanisms are presented in Section 6 and Section 7. In Section 8, we evaluate the discussed methods experimentally. Finally, in Section 9, we provide a discussion of our results and provide an outlook on future work. Most proofs are deferred to Appendix A.

Part of this paper was presented at the IEEE International Symposium on Information Theory 2021 [14]. In this paper, we generalize from a χ2-divergence to an arbitrary Rényi divergence. Moreover, Section 4 and Section 6, most of Section 8, and all proofs are new in the current paper.

2. Related Work

2.1. The Pufferfish Framework

Our RLDP framework is an instance of the more general Pufferfish framework [15]. In this subsection, we make this connection explicit and elaborate on the semantic guarantees offered by RLDP.

A privacy definition following the Pufferfish framework specifies (i) a set of potential secrets, (ii) a set of discriminative pairs of secrets, and (iii) a set of assumptions about how data are generated. In RLDP the potential secrets are the possible values of S, i.e., S. We want to prevent the aggregator from learning anything about S. This means that it should not be able to distinguish the case S=s from S=s for all ss, so all non-identical pairs are discriminative. Note that this relies on S being finite, with extensions to continuous S discussed in detail in [15].

The set of assumptions on how data are generated consist, in our setting, of probability distributions over X. A key idea in Pufferfish is that this set explicitly models the information that is available to an attacker, i.e., an entity that is trying to infer information about S by observing Y. In our setting, the aggregator is the only attacker and a probability distribution P over X captures the beliefs that the attacker has about S prior to seeing Y. We can rewrite (1) as

PXP(S=s|Y=y)PXP(S=s|Y=y)eεPXP(S=s)PXP(S=s) (3)

and see that our local differential privacy constraint (1) can be interpreted as the condition that the posterior distribution of S after seeing Y must be very close to the prior distribution. The relevance of P is that it captures a specific set of beliefs of the attacker. As such, we want (3) to hold for various values of P, where each P captures specific background/side-knowledge available to the attacker/aggregator. Note that by doing so we are not making any claims about the actual knowledge available to the aggregator, but instead describing the possible scenarios for which we want to protect the privacy of users. In Pufferfish, these possible scenarios are called the set of assumptions on how data are generated, and in RLDP this is F.

Often, side-information in the form of domain knowledge or existing data is publicly available; i.e., to both the users and the aggregator. This public side-information may suggest, for instance, that there is, at most, limited dependence between S and U. In that case, protecting against attackers who have the belief that S=U incurs an enormous penalty in achieved utility. It is true that those attackers gain a lot of information on S by observing Y. However, they could have also obtained this information from the public side-information directly. Therefore, the approach taken in the Pufferfish framework and in this paper is that we only protect against attackers that have beliefs, i.e., distributions P, that are in line with publicly available side information.

A challenge in working with the Pufferfish framework is that it is often challenging to find good mechanisms. A general mechanism is proposed in [20], but it relies on enumerating over all distributions in F, which is an uncountable set in our setting and cannot be used here. A constrained version of Pufferfish that facilitates analysis and a methodology for finding good mechanisms is proposed in [21]. Another interesting line of work is to model correlations between users in the non-local differential privacy setting [22]. Finally, ref. [23] proposed a modeling framework for capturing domain knowledge about the data. In contrast, in the current work, we impose constraints that are learned from data. Our setting does not fit any of the frameworks for which good mechanisms are known in the literature. One of the main contributions of this paper is to develop such mechanisms.

2.2. Other Privacy Frameworks

Disclosing X through a privacy mechanism that protects sensitive information S has been studied extensively. One line of work starts from differential privacy [24] and imposes the additional challenge that the aggregator cannot be trusted, leading to the concept of local differential privacy [1,2,5]. For this setting, several privacy mechanisms exist, including randomized response [19] and unary encoding [25]. Optimal LDP mechanisms under a variety of utility metrics, including mutual information, are found in [5]. In [1,2,5], all data are sensitive, i.e., X=S. The variation of LDP for the case of disclosing X=(S,U), where only S is sensitive, was proposed in [3] and is the setting that we study in this paper. Another line of work connects this setting to the information bottleneck [26], leading to a privacy constraint in terms of mutual information [6,8,9,10]. In these works, it is shown that approaches to optimizing the information bottleneck also work for finding good privacy mechanisms.

Next to differential privacy and mutual information as privacy measures, a multitude of other privacy frameworks and leakage measures exist [27]. Some of these have been studied in the context of privacy mechanisms. In [7,11], privacy leakage is measured through the improved potential of statistical inference by an attacker after seeing the disclosed information. This measure is formulated through a general cost function, with mutual information resulting as a special case. Perfect privacy, which demands the output to be independent of the sensitive data, was studied in [28], and methods were given to find optimal mechanisms in this setting. An estimation-theoretic framework was studied in [29,30]. Our use of a Rényi divergence in the construction of F may suggest considering a generalization of our privacy definition. This could be achieved by considering, for instance, a Rényi divergence in the privacy constraint, as done in [31]. Along a different line, in [32], the maximal leakage measure with a clear operational interpretation is defined. In [33], this measure is generalized to a parametrized measure, enabling interpolating between maximal leakage and mutual information. A stronger, pointwise, version of the maximal leakage measure is proposed in [34]. These are interesting research directions but not pursued in this paper.

Our setting X=(S,U) is a special case of a Markov chain SXY, where only X is observed. This Markov chain is typically studied in the information bottleneck and privacy funnel settings [6,26]. We do not generalize to this setting, because we need observations of S for the estimate of PU|S. Without direct observations of s, we can only make worst-case assumptions on PU|S, leading to very poor utility. A different type of model, in which only part of the information in X is sensitive, is proposed in [12]. This is a block-structured model in which X is partitioned and information about the partition of an element is sensitive but its index in the partition is not. Our setting of X=S×U does not fit this model. One can partition X according to U, but our privacy constraints are different from [12]. We will elaborate on this in Section 6.

2.3. Robustness

The distribution PS,U* is not available in practice. The approach taken in most works is to estimate PS,U* from data and analyze sensitivity with respect to this estimate P^S,U. One of the contributions in [7] is to quantify the impact of mismatched priors, i.e., the impact of not knowing PS,U* exactly. A bound on the resulting level of privacy is derived in terms of the total variational distance between the actual and the estimated P^S,U. The setting in [35] is similar to ours: A ball of probability distributions, centered around a point estimate, was defined that contains PS,U* with high probability. It was then shown that a privacy mechanism that was designed based on the empirical distribution was valid for the entire set for a looser privacy constraint. The privacy slack was quantified and shown to approach zero as the size of the data set increased. An important difference with the current work was that we explicitly optimize the privacy mechanism over the uncertainty set. Another difference is that we base our ball on a Rényi divergence, whereas [35] used an l1 norm. The main technical tool used in [35] was large deviations theory, whereas we rely on convex analysis and robust optimization. We also mention [36,37]. In [36] it is assumed that nothing is known about PS* and PU|S*. It is shown that good privacy mechanisms can be found through a connection to maximal correlation, see also [38]. In [37], sets of probability distributions are not derived from data but carefully modeled such that optimal mechanisms can be derived analytically.

Using robust optimization [16] to find a good mechanism that satisfies privacy constraints for all PS,U in uncertainty set F was proposed in [13,14]. In this work, we generalize and extend results from [14]. The idea of robust optimization is that constraints in an optimization problem contain uncertain parameters that are known to come from a (a priori defined) uncertainty set. The constraints must hold for possible values of the uncertain parameters. A key result is that, using Fenchel duality, the problem can be expressed in terms of the support function of the uncertainty set and the convex conjugate of the constraint [16,17]. The case where the uncertain parameters are probabilities is known as distributionally robust optimization. Using results from [39], it was shown in [40] how an uncertainty set can be constructed from data using an f-divergence, providing an approximate confidence set. Confidence sets for parameters that are not necessarily probabilities were constructed in [18] under a χ2-divergence. Convergence of robust optimization based on f-divergences was studied in [41] and for the case of a KL-divergence in [42]. In [43], it is shown how distributionally robust optimization problems over Wasserstein balls can be reformulated as convex problems. For the regular differential privacy setting, distributionally robust optimization was used in [44] to find optimal additive privacy mechanisms for a general perturbation cost function. In this paper, we show how robust optimization can be applied to the setting of partially sensitive information with local differential privacy.

2.4. Miscellaneous

Another line of work on privacy mechanisms builds on recent advances in generative adversarial networks [45]. In [46,47], a generative adversarial framework is used to provide privacy mechanisms that do not use explicit expressions for PX. Even though this is not explicitly addressed in [46,47], it is expected that the generalization properties of networks will provide a form of robustness. Closely related approaches are used in the field of face recognition [48,49], with the aim of preventing biometric profiling [50]. The leakage measures that are used in [48,49], however, do not seem to have an operational interpretation.

Disclosing information in a privacy-preserving way is one of the main challenges in official statistics [51,52]. The setting considered in the current paper is closely connected to disclosing a table with microdata, where each record in the table is released independently of the other records. This approach to disclosing microdata was studied in [4] by considering expected error as the utility measure and mutual information as the privacy measure. The resulting optimization problem corresponds to the traditional rate-distortion problem.

3. Model and Preliminaries

In this section, we give an overview of the setting and objectives of this paper. The notation used in this section, as well as the rest of the paper, is summarized in Table 1.

Table 1.

Notation used in this paper. ‘Page’ denotes the page the notation is first defined.

Notation Meaning Page
S sensitive data space 6
U non-sensitive data space 6
X S×U 6
a1,a2,a,b |S|,|U|,|X|,|Y| 6
X=(S,U) user data 6
ine Q privacy mechanism 6
Q matrix of Q 6
Y Q(X) 6
Y output space 6
PX space of prob. dist. on X 6
P* true distribution 6
P^ estimated distribution 7
I mutual information 8
F uncertainty set for P 6
PU|s condition probability vector 9
FU|s conditional projection of F 9
Lu|s(F),rads(F) statistics of F 9
Dα Rényi divergence 10
PolyOpt PolyOpt 13
SRRε Secret Randomized Response 17
IRR1,R2 Independent Reporting 18

The data space is X=S×U, where S and U are finite sets. We write |S|=:a1, |U|=:a2, and |X|=a1a2=:a. Data items X=(S,U) are drawn from a probability distribution P* in PX, the space of probability distributions on X; here, S represents sensitive data, while U represents non-sensitive data. The aggregator’s aim is to create a privacy mechanism Q:XY such that Y=Q(X) contains as much information about X as possible, while not leaking too much information about S.

The mechanism Q is a probabilistic map, which we represent by a left stochastic matrix (Qy|x)yY,xX, and we write |Y|=b. Often, we identify Y={1,,b}, and likewise for other sets.

The distribution P* is not known exactly. Instead, there is a set of possible distributions FPX, where PX denotes the probability simplex over X. We choose F in such a way that it is likely that P*F. The uncertainty set F captures our uncertainty about P*, we guarantee privacy for all PF. We denote this as robust local differential privacy (RLDP).

Definition 1 

(Robust Local Differential Privacy). Let ε0 and FPX. We say that Q satisfies (ε,F)-RLDP if for all s,sS, all yY, and all PF we have

PXP(Y=y|S=s)eεPXP(Y=y|S=s). (4)

Note that we use the notation PXP() to emphasize that X is distributed according to P. If no confusion can arise, we often leave out the subscript XP, to improve readability. Note that we can also write

PXP(Y=y|S=s)=uUQy|s,uPXP(U=u|S=s), (5)

so Definition 1 depends on the conditional probabilities of U given S=s and S=s. It does not, however, depend on the realization of U.

For clarity and use in future sections, we give the definition of regular LDP [1], which is used when the goal is to obfuscate all of X, rather than just S.

Definition 2 

(Local Differential Privacy). Let ε0. We say that Q:XY satisfies ε-LDP if for all x,xX and all yY we have

P(Y=y|X=x)eεP(Y=y|X=x). (6)

Now, for aggregator uncertainty about P*, as captured by F, we suppose there is a data base x=(x1,,xn) accessible to the user, where each xI=(sI,uI) is drawn independently from P*. Based on this, the user produces an estimate P^ of P*. In the experiments, we consider a maximum likelihood estimator, i.e., P^x=|{in:xI=x}|. We construct the uncertainty set F as a closed ball around P^. In particular, let Dα be the Rényi divergence of order α on PX, i.e., for α(0,)

Dα(P^||P)=1α1logxXP^xαPxα1,ifα1,xXP^xlogP^xPx,ifα=1. (7)

The case α=1 follows, in fact, as a limit from the α1 case. Similarly, the definition can be extended to α{0,} by taking the corresponding limits, but in this paper we restrict our attention to α(0,) to keep the presentation clear. Note that D1=DKL, the Kullback–Leibler divergence, and D2=logχ2, where the χ2-divergence is χ2(P1||P2)=x(P1,xP2,x)2P2,x1. In general, a Rényi divergence is a continuous increasing function of a power divergence (a.k.a. Hellinger divergence) [39,53,54], an example of an f-divergence. We omit α from the notation when it is clear from the context.

We define F by fixing a bound B[0,] and letting

F=PPX:Dα(P^||P)B. (8)

Since a Rényi divergence is a continuous increasing function of an f-divergence, it follows from [39,40] that F is a confidence set for P*. In particular, for the case of α=2, which will be used in our numerical experiments in Section 8, for suitable B, we have

F=PPX:x(P^xPx)2PxFχ2,a11(1β)n, (9)

with β(0,1), where Fχ2,a1 is the cumulative density function of the χ2-distribution with a1 degrees of freedom, resulting in a set F with significance level β. This means that the probability of P*F is at least 1β.

Hence, by designing Q based on F, we are confident in satisfying (1) for all attackers that have beliefs that are based on the public side-information, as well as for attackers that have beliefs that are closer to P*.

As a special case of the above, we will study the case that nothing is known about P*. In this case, B and F=PX. Regarding privacy, this is the ‘safest’ choice, as we do not make assumptions about P*. Another special case is where F is a singleton, which reflects a situation where B=0 and P* is assumed to be known. This setting was studied in [3].

Given F and ε, the goal is now to create a Q:XY to be used on new/future data; our setting is depicted in Figure 1. The aim of this paper is to find a satisfactory answer to the following problem:

Figure 1.

Figure 1

An overview of the setting of this paper when F is a confidence set based on a data set x. Note that it is typically, but not necessarily, true that P*F.

Problem 1. 

Given F and ε, find a Q satisfying (ε,F)-RLDP, while maximizing a given utility function.

Throughout this paper, we follow the original privacy funnel [6] and its LDP counterpart [3] in taking mutual information I(X;Y) as a utility measure. As is argued in [6], mutual information arises naturally when minimizing log loss distortion in the privacy funnel scenario. As a utility measure of Q, we take IXP(X;Y) (abbreviated to IP(X;Y)), since the aim is to create Y that reflects X as faithfully as possible. This utility measure depends on the distribution P of X that we choose to evaluate. Ideally, one would like to use P=P*, but in practice this is not possible, as P* is unknown. In the theoretical part of this paper, we circumvent this issue by proving our results for general P. In the experiments of Section 8, we take P=P^ as the best available alternative to P=P*. We investigate the effect of this choice by comparing IP*(X;Y) to IP^(X;Y).

Another option is to use the robust utility measure minPFIP(X;Y) to ensure good utility for every ‘reasonable’ P, see [13]. We do not explicitly study this measure in this paper, but since our results hold for general P, they can also be applied to robust utility.

Example 1. 

We set up an example to illustrate the concepts of this paper. Take S={s1,s2} and U={u1,u2}, and suppose

P*=Ps1,u1*Ps1,u2*Ps2,u1*Ps2,u2*=0.10.10.20.6. (10)

Moreover, suppose we have a publicly known database of n=100 entries, from which we estimate

P^=P^s1,u1P^s1,u2P^s2,u1P^s2,u2=0.070.100.260.57. (11)

To obtain a 95%-confidence set for F according to a χ2-distribution, we take α=2 and B=log1+Fχ2,31(0.05)100=0.0752. In this way, we obtain

(12)F=PPX:Dα(P^||P)B(13)=PPX:logxP^x2Pxlog1+Fχ2,31(0.05)100(14)=PPX:x(P^xPx)2PxFχ2,31(0.95)100,

which is the desired confidence set (note that the χ2-distribution has |X|1=3 degrees of freedom). In this case, we have D2(P^||P*)=0.0281<B, so P*F.

4. Conditional Projection of F

In Section 5 and Section 7 below, we will introduce privacy mechanisms that provide (ε,F)-RLDP. These mechanisms depend on the conditional projections of F on PU given S=s, denoted as FU|s. In this section, we analyze the structure and statistics of these sets. To do so, we introduce, for sS, uU and PPX.

(15)Ps=uUPu,s,(16)Pu|s=Pu,sPs,(17)PU|s=(Pu|s)uUPU,(18)FU|s={PU|s:PF}PU,

We are interested in the following statistics:

(19)Lu|s(F)=minRFU|sRuforagivenuU,(20)rads(F)=maxRFU|s||RP^U|s||1.

In (19), Ru is the u-coefficient of RPU. It turns out that these statistics give us the information required to construct (ε,F)-protocols efficiently: In Section 5, we use Lu|s(F) to approximate FU|s by a polytope, to make computation easier, while in Section 7, we use rads(F) as a measure for the size of FU|s. While these statistics (or bounds for them) are relatively easy to find for F itself, the hard part lies in the fact that we have to give bounds for the projection FU|s. The extent to which these bounds can be found explicitly heavily depends on the divergence measure that is used to construct F. In this section, we show how these bounds can be obtained for our case where we construct F using a Rényi divergence. The reason for this, as we will see below, is that we can give an explicit description of FU|s.

4.1. Structure of FU|s

Recall that, for a given α(0,), the Rényi divergence Dα:PX[0,) is defined by

Dα(P^||P)=1α1logxXP^xαPxα1,ifα1,xXP^xlogP^xPx,ifα=1. (21)

The following theorem states that the conditional projections of balls defined by Rényi divergence are themselves Rényi divergence balls:

Theorem 1. 

Let sS be such that P^s>0. Let F be defined by Rényi divergence, i.e.,

F=PPX:Dα(P^||P)B (22)

for a given α(0,) and BR0. Define the constant Bs by

Bs=αα1loge(α1)B/α(1P^s)P^s,if α1,BP^s,if α=1. (23)

Then,

FU|s=RPU:Dα(P^U|s||R)Bs. (24)

This theorem gives us a direct description of the FU|s, which is useful because the Lu|s(F) of (19) and rads(F) of (20) are defined in terms of these projection sets. A similar bound could also be found for the limit cases α=0,, but this is not pursued in this paper, because it does not provide additional insights.

A key property of the Rényi divergence that allows us to prove Theorem 1 is that we can write

P^xαPxα1=P^u|sαPu|sα1·P^sαPsα1. (25)

This allows us to express the divergence Dα(P^U|s||PU|s) in terms of Dα(P^||P). For other divergences, which may depend on P^ and P in a more complicated way, this is typically not possible. Therefore, we cannot generalize our results to uncertainty sets constructed from, for instance, arbitrary f-divergences.

In light of this theorem and the fact that in the following sections we care more about the statistics of FU|s than about those of F itself, one might be inclined to think that it is more straightforward to estimate the P^U|s from the data and defining uncertainty sets FU|s around them directly, without going through the intermediate stage F. However, projecting these sets back to PX results in a larger set. In other words, there are distributions P such that each PU|s is an element of FU|s, while PF. That is, we have FF:={PPX:sPU|sFU|s}. The reason for this is that, in the proof of Theorem 1, it becomes clear that the PF that project to the boundary points of FU|s satisfy PU|s=P^U|s for ss. In other words, elements of F can be extremal in, at most, one FU|s. By contrast, F also includes P that are extremal in multiple FU|s. We conclude that constructing the FU|s directly results in a larger F, which results in a lower utility. We will give an example of this phenomenon in Example 2.

4.2. Statistics of FU|s

In this section, we analyze statistics of FU|s. More concretely, to find Lu|s(F) and rads(F), fix s, α and B and define for ρ[0,1] and ξR0 such that ξ(1ρ)1,

(26)φBs(ρ,ξ)=1α1logρξ1α+(1ρ)1ρξ1ρ1αBs,if α1andρ1,ρlog1ξ+(1ρ)log1ρ1ρξBs,if α=1andρ1,log1ξBs,if ρ=1,(27)ξ(ρ)=infξ(0,1]:φBs(ρ,ξ)0,(28)ξ+(ρ)=supξ[1,(1ρ)1):φBs(ρ,ξ)0.

Note that the case ρ=1 can be obtained via taking the limit. The expressions for ξ and ξ+ are a bit complicated, but note that, given ρ<1, the function φBs(ρ,ξ) is convex in ξ. Thus, φBs(ρ,ξ)=0 has at most two solutions. Furthermore, φBs(ρ,1)=Bs and φBs(ρ,ξ) as ξ approaches 0 or 11ρ, so for ρ<1 the values ξ(ρ) and ξ+(ρ) are the two solutions to φBs(ρ,ξ)=0.

The following proposition expresses our desired statistics in terms of ξ and ξ+.

Proposition 1. 

Let uU. Then,

(29)Lu|s(F)=P^u|sξ(P^u|s),(30)rads(F)=2maxU1U:U1P^U1|s(ξ+(P^U1|s)1).

As discussed above, ξ±(ρ) can be found quickly numerically; however, the calculation of rads(F) still involves taking the maximum over an exponentially large set.

4.3. Special Case α=2

In this section, we show that when α=2, we can find explicit expressions for ξ± and consequently Lu|s and rads. As discussed in (9), for this α, the set F is a confidence set for a χ2-test. To find ξ(ϱ),ξ+(ϱ), we need to solve φBs(ρ,ξ)=0. For α=2, we can write this as a quadratic equation in ξ, and solving it leads to the following expression:

Lemma 1. 

Suppose α=2. Then,

ξ(ρ)=eBs+2ρ1(eBs1)(eBs(2ρ1)2)2eBsρ, (31)
ξ+(ρ)=eBs+2ρ1+(eBs1)(eBs(2ρ1)2)2eBsρ. (32)

Now, we can determine Lu|s(F) and rads(F) using Lemma 1 and Proposition 1. For Lu|s(F), we immediately obtain an expression; for rads(F), a careful analysis of ξ+ shows that the optimal U1 of (30) can be found. For large enough Bs, the optimum is at U1={umin}, where umin is the u that minimizes P^u|s. Thus, we obtain a concrete expression for rads(F) without the need for optimization. For smaller Bs, we do not find an exact expression, but we can still derive a lower bound. The results are summarized in the following proposition.

Proposition 2. 

Let α=2. Then, the following hold:

  • 1. 
    One has
    Lu|s(F)=eBs+2P^u|s1(eBs1)(eBs(2P^u|s1)2)2eBs. (33)
  • 2. 
    Let umin=argminuUP^u|s. If Bslog(1+(1P^umin|s)2), then
    rads(F)=eBs+2P^umin|s1+(eBs1)(eBs(2P^umin|s1)2)eBs. (34)
  • 3. 

    If Bs<log(1+(1P^umin|s)2), one has rads(F)eBs1.

We note that α=2 is not the only value of α for which one can bound Lu|s and rads. For instance, for α1, one can use Pinsker’s inequality [55,56] and its generalizations [57] to bound rads(F) in terms of ||P^U|sPU|s||1, which in turn can be used to bound Lu|s(F). However, unlike α=2, these do not result in exact bounds.

Example 2. 

We continue Example 1. We have

P^s1=0.17,P^u1|s1=0.4118,P^u2|s1=0.5882,P^s2=0.83,P^u1|s2=0.3133,P^u2|s2=0.6867.

Inserting our values of B and P^s into Theorem 2, we find Bs1=0.3782, Bs2=0.0900. In other words,

PU|s1=R=Ru1Ru2PU:D20.41180.5882||Ru1Ru20.3782, (35)
PU|s2=R=Ru1Ru2PU:D20.31330.6867||Ru1Ru20.0900. (36)

To determine the lower bounds on each Rui, we use Proposition 2 to obtain

Lu1|s1(F)=0.1620,Lu2|s1(F)=0.2829,Lu1|s2(F)=0.1923,Lu2|s2(F)=0.5337.

In principle, we can also use Proposition 2 to determine the rads(F). However, in this case, there is a more straightforward approach. Since |U|=2, every element of FU|s is a vector of length two whose coefficients sum to 1; thus PU|s is determined by Pu1|s. Since Lu1|s(F)Pu1|s1Lu2|s(F), it follows that

FU|s1[Lu1|s1(F),1Lu2|s1(F)]=[0.1620,0.7171],FU|s2[Lu1|s2(F),1Lu2|s2(F)]=[0.1923,0.4663].

Under this identification, rads(F) is only twice the maximal distance from P^u1|s to the endpoint of this interval (the factor two comes from the fact that ||PU|sP^U|s||1=|Pu1|sP^u1|s|+|Pu2|sP^u2|s|=2|Pu1|sP^u1|s|). Hence,

rads1(F)=2max{0.41180.1620,0.71710.4118}=0.6107,rads1(F)=2max{0.31330.1923,0.46630.3133}=0.3061.

We can also construct the set F={PPX:sPU|sFU|s} of Section 4.1. We can write this as

F=Ps1,u1Ps1,u2Ps2,u1Ps2,u2PX:0.1620Pu1|s10.7171,0.1923Pu1|s20.4663. (37)

The inequality 0.1620Pu1|s1 can be written as 0.1620Ps1,u1Ps1,u1+Ps1,u2, or 0.1620Ps1,u20.83830Ps1,u1; in other words, this becomes a linear constraint. We can do the same for the other constraints and these, together with inequality constraints of the form Ps,u0 and the equality constraint s,uPs,u=1, define the polytope FR4. One can calculate that this polytope is a simplex, spanned by the vertices

0.71710.282900,0.16200.838000,000.46630.5337,000.19230.8077. (38)

The resulting F is considerably larger than F: one way to see this is that, for any of these vertices P, one has D2(P^||P)=. This example shows the importance of working with the set F, rather than with just its projections FU|s.

5. Polyhedral Approximation: PolyOpt

In this section, we introduce PolyOpt, a family of mechanisms Q with good utility obtained by enclosing F by a polyhedron, and then using robust optimization for polyhedra [16] to describe the space of possible Q as a polyhedron; we then maximize the mutual information over this polyhedron. This approach is related to the polyhedral approach of [3], which finds the optimum for this problem in a non-robust setting.

For a mechanism Q and yY, we define Qy=(Qy|x)xXRX to be the y-th row of the stochastic matrix Q corresponding to Q, but transposed (i.e., viewed as a column vector). Likewise, we define the column vector Qy|s=(Qy|s,u)uURU. In this notation, the condition for (ε,F)-RLDP can be formulated as

yYs1,s2S:maxPFPU|s1TQy|s1eεPU|s2TQy|s20. (39)

Equation (39) boils down to a set of linear constraints in Qy. What makes these difficult to satisfy is that every value PF provides a linear constraint, and each Qy has to satisfy all infinitely many of these. In this section, we address this difficulty by making the set F slightly larger, so that robust optimization [16] becomes a convenient tool for optimizing over the allowed Q. More precisely, for every sS, let DsPU be such that FU|sDs. Then, certainly

maxPFPU|s1TQy|s1eεPU|s2TQy|s2maxR1Ds1,R2Ds2R1TQy|s1eεR2TQy|s2. (40)

Thus, we can conclude that Q is (ε,F)-RLDP whenever

yYs1,s2S:maxR1Ds1,R2Ds2R1TQy|s1eεR2TQy|s20. (41)

The trick is now to choose the Ds in such a way that the set of Q satisfying (41) has a closed-form description. To this end, we let each Ds be a polyhedron; that way, we can use robust optimization for polyhedra [16] to give such a description.

There are multiple ways to create a polyhedron Ds that envelops FU|s. Writing Lu|s=Lu|s(F) for convenience, we take

Ds={RPU:uRuLu|s}. (42)

Since Ds is described by linear equations, it is a polyhedron, and certainly FU|sDU|s for all s. Robust optimization for polytopes [16] then allows us to describe the set of mechanisms satisfying (41). To formulate this, we first need the following definition:

Definition 3. 

Let ε>0. Then, define Γε to be the convex cone consisting of all vR0X that satisfy, for all s1,s2S and all u1,u2U:

vs1,u1eεvs2,u2+uLu|s1vs1,uvs1,u1eεuLu|s2vs2,uvs2,u20. (43)

Note that, for every choice of s1,s2,u1,u2, (3) is a linear inequality in T and thus defines a half-space in RX. The intersection of these half-spaces, intersected with R0X, defines the convex cone Γε. This definition allows us to formulate the following result:

Theorem 2. 

Let Q be a privacy mechanism, and for yY, let Qy be the y-th row of the associated matrix Q=(Qy|x)yY,xX. Suppose that for all y we have QyΓL. Then, Q satisfies (ε,F)-RLDP.

The upshot of this theorem is that we have translated the infinitely many constraints of (39) and (41) into the finitely many linear constraints of (3). This makes optimizing utility considerably easier. We perform this optimization by translating it into a linear programming problem. The key inspiration for this optimization is Theorem 4 of [5], where optimal LDP mechanisms are found by translating the problem of optimizing mutual information into linear programming; we use an analogous approach adapted to RLDP. This approach can be sketched as follows: Let Γ^={vΓε:xvx=1}, i.e., the intersection of Γε with the hyperplane corresponding to xvx=1. This is a polyhedron, and every Q satisfying the conditions of Theorem 2 has Qy=θyvy, for some θyR0 and vyΓ^. The authors of [5] made a number of key observations that also apply to our situation. The first is that, in this case, we can write

IP^(X;Y)=yθyμ(vy), (44)

where

μ(v)=xXvxP^xlogvxxvxP^x. (45)

The second observation is that, in order to maximize (44), one can prove from the convexity of μ that it is optimal to have each vy be a vertex of Γ^. Thus, once we know the set of vertices V of Γ^, we find the optimal Q by assigning a weight θv to each vV, in such a way that the resulting Qy form a probabilistic matrix and such that (44) is maximized. Since (44) is linear in θ, this is a linear programming problem. This discussion is summarized in the following theorem:

Theorem 3. 

Let Γ^ be a polyhedron given by {vΓL,ε:xvx=1}. Let V be the set of vertices of Γ^. Define μ as in (45). Let 1XRX be the constant vector of ones. Let θ^R0V be the solution to the optimization problem

maximiseθvVθvμ(v)satisfyingθR0V,vVθvv=1X. (46)

Let the privacy mechanism Q be given by Y={vV:θ^v>0} and Qv|x=θ^vvx. Then, the mechanism Q maximizes IP^(X;Y) among all mechanisms satisfying the condition of Theorem 2. One has Ya.

Together, Theorems 2 and 3 show that if we can solve a vertex enumeration problem, we can find a mechanism Q that maximizes IP^(X;Y) among a subset of all (ε,F)-RLDP mechanisms; furthermore, we ensure that the output space Y is, at most, the size of the input space X. The proof of Theorem 3 is analogous to the proof of Theorem 4 of [5] and is given in Appendix A.5. Note that the results of [5] do not run into the vertex enumeration problem, because the relevant polyhedron there is [1,eε]X, for which the vertices are known.

We remark that a simplex is not the only possible choice for Ds. In general, we can make Ds closer to FU|s by adding more defining hyperplanes. Doing this allows more Q to satisfy Theorem 2 and in turn increases the utility of the Q we find via Theorem 3. However, since Γ is related to the Ds via duality, adding extra constraints to the Ds will increase the dimension of Γ through the addition of auxiliary variables. This makes the vertex enumeration problem of Theorem 3 more computationally involved. Thus, we have a trade-off between utility and computational complexity. Even with the given, ‘simple’ choice of Ds, the computational complexity is quite high: recall that we defined a=|X|. The polytope Γ^ is (a1)-dimensional and is defined by a2+a inequalities, thus it has O((a2+a)a12)=O(aa) vertices [58]. Since this is the dimension of the linear programming problem, we find that the total complexity of finding Q is O(aωalog(aa+1/δ)), where ω2.38 is the exponent of matrix multiplication and δ the relative accuracy [58]. Clearly, this becomes infeasible rather quickly for large a.

It should be noted that, in general, the increasing utility obtained by decreasing Ds in size does not approach the optimal utility over all (ε,F)-RLDP mechanisms. This is because, as we take increasingly finer DU|s, we approach the set of Q that satisfy (4) for all P in F:={P:sPU|sFU|s}. As discussed in Section 4.1, one has FF. As a result, the set of (ε,F)-RLDP mechanisms is strictly smaller than the set of (ε,F)-RLDP mechanisms.

Example 3. 

We continue Example 2 by taking ε=log2. To obtain Γ^ in Theorem 3, we need to combine the defining inequalities of Γε in Definition 3, along with the defining equality xPx=1. Regarding the inequalities, we have 24=16 inequalities of the form (3), as well as 4 inequalities of the form vx0. Together with the equality constraint, we obtain a 3-dimensional polytope in RX=R4. Using a vertex enumeration algorithm, one finds that V consists of the rows of the matrix V below, where the order of the columns is the order of the rows of Example 1. For each row v, we can calculate μ(v), resulting in the vector μ below. Solving (46), we obtain the vector θ^ below:

V=0.07440.32270.56030.04260.24260.24260.47830.03640.33330.33330.16670.16670.10910.47370.20860.20860.09930.431000.46970.11210.486400.40150.34040.340400.31910.07700.33430.29440.29440.22340.223400.55310.48750.143400.36900.43600.128300.43580.47580.14000.19210.19210.34370.10110.27760.27760.16020.16020.63160.04810.16670.16670.33330.33330.33250.09780.52940.0403,μ=0.11520.09420.00870.01350.10970.09680.07230.00800.12400.08780.10140.01060.00760.12400.00750.1083,θ^=1.189900000.767000001.413400000.6297. (47)

We now obtain the privacy mechanism QPolyOpt as follows: each row of QPolyOpt corresponds to a non-zero coefficient of θ^, multiplied by its corresponding row of V. Thus, we obtain

QPolyOpt=Qy1|s1,u1Qy1|s1,u2Qy1|s2,u1Qy1|s2,u2Qy2|s1,u1Qy2|s1,u2Qy2|s2,u1Qy2|s2,u2Qy3|s1,u1Qy3|s1,u2Qy3|s2,u1Qy3|s2,u2Qy4|s1,u1Qy4|s1,u2Qy4|s2,u1Qy4|s2,u2 (48)
=0.08850.38400.66670.05070.08600.373100.30800.61620.181300.61590.20940.06160.33330.0254. (49)

Note that indeed we have 4=ba=4. As for the utility, we have IP^(X;Y)=μ·θ^=0.4228. However, the true utility is significantly lower, namely IP*(X;Y)=0.2804.

6. An Optimal Policy for F=PX

As PolyOpt mechanisms are obtained via vertex enumeration in a-dimensional space, this can be computationally infeasible for larger a. Thus, there is a need for methods that, given P^ and F, can find (ε,F)-RLDP mechanisms with reasonable computational complexity.

In this section, we consider the case where F is maximal, i.e., F=PX. By itself, this represents a situation where we want privacy for every possible probability distribution on X. This scenario may not be very relevant in practice, but any protocol that we in find this way is also (ε,F)-RLDP for any F. As we will see below, this allows us to find (ε,F)-RLDP protocols in a computationally efficient manner.

We show that (ε,PX)-RLDP is almost equivalent to LDP. We exploit this to create SRR, the RLDP analogue to GRR [5], the LDP mechanism that is optimal for ε0. SRR only depends on ε and X and not on P^, and as such does not require an optimization procedure to be found; this makes it a good choice when vertex enumeration is computationally infeasible. The downside is that SRR has a stricter privacy requirement than PolyOpt, as it takes F to be maximal; in Section 8, we investigate numerically to what extent this results in a lower utility.

We start by giving a characterization of (ε,PX)-RLDP. Like LDP, this can be defined by an inequality constraint on the matrix Q.

Proposition 3. 

Q satisfies (ε,PX)-RLDP if and only if for all yY and (s,u),(s,u)X with ss one has

Qy|s,uQy|s,ueε. (50)

Proof. 

Suppose that Q satisfies (ε,F)-RLDP with respect to PX. Let (s,u),(s,u)X with ss. Let P be given by

Px=12,if x{(s,u),(s,u)},0,otherwise. (51)

Then, Pu|s=1 and Pu|s=0 for all uu; an analogous statements holds for Pu|s. It follows that

(52)Qy|s,uQy|s,u=Qy|s,uPu|sQy|s,uPu|s(53)=uQy|s,uPu|suQy|s,uPu|s(54)=PXP(Q(X)=y|S=s)PXP(Q(X)=y|S=s)eε.

This proves “⇒”. On the other hand, suppose that Qy|s,uQy|s,ueε for all ss and u,u. Then, for all ss and P, we have

PXP(Q(X)=y|S=s)PXP(Q(X)=y|S=s)=uQy|s,uPu|suQy|s,uPu|seε. (55)

Hence, Q satisfies (ε,PX)-RLDP with respect to. F. □

The proposition demonstrates that RLDP is very similar to LDP. The difference is that the condition “for all x,xX” from Definition 2 is relaxed to only those x and x for which ss.

Before moving on and introducing a new mechanism, note that Proposition 3 clearly illustrates the reason that the setting in this paper cannot be modeled using the block-structured approach from [12]. We see that if uu, we still have a privacy constraint, whereas in [12] this is not the case.

Next, we will introduce a mechanism that exploits the difference between LDP and RLDP. Recall that a=|X|; then generalized randomized response [19] is the privacy mechanism GRRε:XX given by

GRRy|xε=eεeε+a1if x=y,1eε+a1otherwise. (56)

This mechanism has been designed such that GRRy|xεGRRy|xε=e±ε for xx, the maximal fractional difference that ε-LDP allows. We will see that for RLDP we can go up to a difference of e±2ε if x=(s,u) and x=(s,u), as we typically only need to satisfy

Qy|s,ueεQy|s,ue2εQy|s,u. (57)

We capture the intuition from the necessary condition (57) in a new mechanism called secret randomized response (SRR). Recall that a1=|S|, a2=|U|.

Definition 4. 

(Secret randomized response (SRR)). Let ε>0. Then, the privacy mechanism SRRε:XX is given by

SRRs,u|s,uε=eεeε+eε(a21)+aa2,if (s,u)=(s,u),eεeε+eε(a21)+aa2,if s=s and uu,1eε+eε(a21)+aa2,if ss, (58)

It is clear that SRRy|s,uεSRRy|s,uε{e2ε,eε,1,eε,e2ε}, and the two extreme cases are only possible when s=s. Thus, we can conclude

Lemma 2. 

SRR satisfies (ε,PX)-RLDP.

Example 4. 

We continue Example 3. Although SRR is closely related to GRR, adopting it can still have a significant impact on utility. For instance, in the setting of Example 3, we obtain

GRRε=0.40.20.20.20.20.40.20.20.20.20.40.20.20.20.20.4,SRRε=0.4440.1110.2220.2220.1110.4440.2220.2220.2220.2220.4440.1110.2220.2220.1110.444. (59)

Then,

IP^(X;GRRε(X))=0.0419,        IP^(X;SRRε(X))=0.1005, (60)
IP*(X;GRRε(X))=0.0412,        IP*(X;SRRε(X))=0.0942. (61)

We see that adopting SRR more than doubles the utility. Compared to Example 3, we see that the utility is still significantly lower than that of PolyOpt, but the advantage is that we obtain SRR directly from ε, without having to take P^ or F into account; this ensures a significantly faster computation.

The power of SRR, beyond slightly improving on GRR, is that we can prove it maximizes IP(X;Y) for sufficiently large ε; the cutoff point depends on P. This is proven analogously to the result of [5], where GRR is the optimal LDP mechanism for sufficiently large ε.

Theorem 4. 

For every P, there is an ε00 such that for all εε0, SRR is the (ε,PX)-RLDP mechanism maximizing IP(X;Y).

The proof of this theorem follows the same lines as the proof of Theorem 14 of [5], in which it is proven that GRR is the optimal LDP mechanism for sufficiently large ε. The proof is presented in Appendix A.6. This solves the problem of finding the optimal (ε,PX)-mechanism, for sufficiently large ε. This strategy is similar to the proof of Theorem 3: one can show that the rows Qy of the optimal (ε,F)-RLDP mechanism Q correspond to vertices of a polyhedron, and the optimal weights assigned to these vertices are found using a linear programming problem. Unlike in the case of Theorem 3, however, we can give an explicit description of the set of vertices, and we can solve the linear programming problem analytically.

Our result shows that if one wishes to satisfy (ε,PX)-RLDP, then SRR is a solid choice, especially for larger ε, since it maximizes IP*(X;Y) for sufficiently ε. Thus, we can optimize IP*(X;Y) without having to know P*, with the caveat that the cutoff point for ‘large enough’ depends on P*.

In [5], the optimal LDP mechanism in the high-privacy regime (i.e., ε1) was also found. In principle, we could also do this for (ε,PX)-RLDP, but this would not be of much use, as the optimal mechanism would depend on P*, which we assume to be unknown.

7. Independent Reporting

Section 5 demonstrated the need to find efficiently computable (ε,F)-RLDP mechanisms with decent utility. In Section 6, we approach this problem by considering (ε,PX)-RLDP instead, allowing us to analytically obtain the optimal mechanism. However, when F is small, this overapproximation might result in a large loss of utility. In this section, we describe independent reporting (IR), a different heuristic that takes the size of F into account, while still being significantly less computationally complex than PolyOpt.

The basis of IR is to apply two separate LDP mechanisms R1 and R2 to S and U, respectively, reporting both outputs.

Definition 5. 

Let Y1,Y2 be sets, and let Y=Y1×Y2. Let R1:SY1 and R2:UY2 be probabilistic maps. Then, theindependent reporting of R1 and R2 is the probabilistic map IRR1,R2:XY given by IRR1,R2(s,u)=(R1(s),R2(u)).

Suppose that Ri satisfies εi-LDP. The composition theorem for differential privacy [59] tells us that IRR1,R2 satisfies (ε1+ε2)-LDP. However, in the RLDP setting, U only indirectly leaks information about S; therefore, we can get away with a higher ε2 compared to the LDP setting. How much higher depends on the degree of relatedness of S and U, which is captured by the possible values of P in F. The precise statement is given in the following result:

Theorem 5. 

Let ε1,ε2R0. For each s, let ds[0,) be such that dsrads(F). Furthermore, define

d=min2,maxs(2ds)+maxs,s||P^U|sP^U|s||1. (62)

Let δ2=log1+2(eε21)d. Suppose that R1 is ε1-LDP and that R2 is δ2-LDP. Then, IR is (ε1+ε2,F)-RLDP.

If S=U, then ||P^U|sP^U|s||1=2 for ss, so d=2 and δ2=ε2. In this case, Theorem 5 is the RLDP analogue to the well-known composition theorem for local differential privacy [59]. In general, δ2ε2; this represents the fact the privacy requirement on R2 is less strict when S and U are only partially related. At the other extreme, if S and U are independent in our observation, we have ||P^U|sP^U|s||1=0 for all s,s. Still, we cannot fully disclose U, since S and U might be non-independent under P*. The term ds is present in the definition of d to account for this possibility.

In order to prove Theorem 5, we need the following lemma:

Lemma 3. 

Let Q:XY be an ε-LDP mechanism. Then, for all yY and all P,PPX we have

PXP(Q(X)=y)PXP(Q(X)=y)1+eε12||PP||1. (63)

Proof. 

Fix y, and let Qymax=maxxQy|x and Qymin=minxQy|x. By the ε-LDP property, it holds that QymaxeεQymin. We hence find

(64)PXP(Q(X)=y)PXP(Q(X)=y)=xXQy|x(PxPx)(65)=x:PxPxQy|x(PxPx)x:Px>PxQy|x(PxPx)(66)Qymax2||PP||1Qymin2||PP||1(67)(eε1)Qymin2||PP||1(68)(eε1)PXP(Q(X)=y)2||PP||1,

from which the lemma directly follows. □

Proof 

(Proof of Theorem 5). We start by showing that d is an upper bound for ||PU|sPU|s||1. If d=2, this is certainly the case. Suppose d=maxs(2ds)+maxs,s||P^U|sP^U|s||1. Then, for all s,sS and PF we have

(69)||PU|sPU|s||1||PU|sP^U|s||1+||P^U|sP^U|s||1+||P^U|sPU|s||1(70)ds+ds+||P^U|sP^U|s||1(71)d.

Combining Lemma 3 with the fact that ε2=log1+d(eδ21)2, it follows that for every y2Y2, we have

(72)PXP(R2(U)=y2|S=s)PXP(R2(U)=y2|S=s)1+eδ212||PU|sPU|s||1(73)1+d(eδ21)2(74)=eε2.

Given S, the random variables R1(S) and R2(U) are independent. It follows that for every y1Y1 and every y2Y2, we have

(75)P(R1(S)=y1,R2(U)=y2|S=s)P(R1(S)=y1,R2(U)=y2|S=s)=P(R1(S)=y1,|S=s)P(R1(S)=y1|S=s)·P(R2(U)=y2|S=s)P(R2(U)=y2|S=s)(76)eε1+ε2,

where the last equality holds because of (74) and because R1 is ε1-LDP. This shows that IRR1,R2 is (ε1+ε2,F)-RLDP. □

Theorem 5 establishes the privacy of independent reporting. To maximize the utility, we need to determine how to divide the privacy budget ε between ε1 and ε2, and which LDP mechanisms to use for R1 and R2. To answer both these questions, we first need an expression for the utility of IR, which is given by the following theorem:

Theorem 6. 

For any PPX, one has

IP(IRR1,R2(X);X)=IP(R1(S);S)+IP(R2(U);U|R1(S)). (77)

Proof. 

Since R1(S) and U are independent given S, and R2(U) and S are independent given U and R1(S), we have

(78)IP(IRR1,R2(X);X)=IP(R1(S),R2(U);U,S)(79)=IP(R1(S);U,S)+IP(R2(U);U,S|R1(S))(80)=IP(R1(S);S)+IP(R2(U);U|R1(S)).

We use Theorems 5 and 6 to find high-utility IR protocols that satisfy (ε,F)-RLDP, given ε and F. To do so, we need to choose R1 and R2, and split the privacy budget between them. Since the expression for the utility of IR in Theorem 6 contains a term IP(R1(S);S), the R1 that maximizes this is GRR when ε is large enough; thus, we choose R1=GRR. The second term in the utility expression is

IP(R2(U);U|R1(S))=ErIUPU|R1(S)=r(R2(U);U). (81)

This is the expected value of an expression that is maximized for R2=GRR, with the caveat that the maximization only holds when ε is large enough, and what ‘large enough’ is depends on the distribution of U. Since this gives us a choice of R2 independent of the distribution, we ignore this caveat and take R2=GRR as well.

Having chosen R1 and R2, we are only left with the division of the privacy budget. If we choose ε2, then by Theorem 5 the privacy parameters of R1 and R2 are ε1=εε2 and δ2=log1+2(eε21)d, respectively. It follows that to find a high-utility IR protocol, we have to solve the following optimization problem:

maximizeε2IPGRRεε2(S),GRRlog1+2(eε21)d(U);S,Usubjecttoε2[0,ε]. (82)

This optimization problem is only 1-dimensional. While it is not straightforward to express the complexity of solving this in O-notation, our experiments in Section 8 show this can be quickly performed numerically, and significantly faster than PolyOpt.

Example 5. 

We continue Example 4. Having found rads(F) and P^U|s1,P^U|s2 in Example 2, we conclude that, in Theorem 5, we have

d=min2,2·max{0.6107,0.3061}+0.41180.58820.31330.68671=1.4591. (83)

It follows that δ2=log1+21.4591(eε21)=log(1.3707eε20.3707). For a given value of ε2, the matrix corresponding to IR(GRRlog(2)ε2,GRRδ2) is the Kronecker product

2eε22eε2+112eε2+112eε2+12eε22eε2+11.3707eε20.37071.3707eε2+0.629311.3707eε2+0.629311.3707eε2+0.62931.3707eε20.37071.3707eε2+0.6293=1C2.74140.7414eε22eε21.3707eε20.370712eε22.74140.7414eε211.3707eε20.37071.3707eε20.370712.74140.7414eε22eε211.3707eε20.37072eε22.74140.7414eε2, (84)

where C=(2eε2+1)(1.3707eε2+0.6293). We now wish to optimize its utility, i.e., find the ε2[0,log2] that maximizes IP^(X;Y). The optimum occurs at the boundary ε2=log(2), for which IP^(X;Y)=0.0755. Notice that now ε1=0, so R1=GRR0 is completely random: its output does not depend on the input. In other words, the optimal IR protocol in this case does not transmit any direct information about S at all, only indirectly through GRRδ2(U). In this case, we have

QIR=0.35170.14830.35170.14830.14830.35170.14830.35170.35170.14830.35170.14830.14830.35170.14830.3517. (85)

Regarding the ‘true’ utility, we have IP*(X;Y)=0.0718. Interestingly, QIR yields less utility than SRR. As we will see in Section 8, this is typical for small S and U.

8. Experiments

In order to gain insight into the behavior of the different mechanisms, we performed several experiments, both on synthetic and real data. We compared the three mechanisms introduced in this paper (PolyOpt, SRR, and IR). Throughout, we let F be a confidence set for a χ2-test, i.e., for a Rényi divergence with α=2. We used the results of Section 4 to find explicit expressions for Lu|s(F) and (an upper bound for) rads(F). Recall from Section 3 that

F=PPX:D2(P^||P)log1+Fχ2,a11(1β)n, (86)

where Fχ2,a1 is the cumulative density function of the χ2-distribution with a1 degrees of freedom, and β(0,1) is a chosen significance level. Throughout the experiments, we took β=0.05, unless otherwise specified.

We used IP^(X;Y) as a utility metric, divided by H(X) to obtain the normalized mutual information (NMI). We used this rather than IP*(X;Y), as the aggregator only has access to the former. In fact, while P* is known for the synthetic data, this is not the case for real data, so we cannot even use IP*(X;Y) as a utility metric.

We compared our methods to two existing approaches, each with a slightly different privacy model. First, we compared to an LDP mechanism, to see to what extent the RLDP framework offered a utility improvement over regular LDP. As the LDP mechanism, we chose GRR, because it optimizes IP(X;Y), our privacy metric, in the low-privacy regime [5]. Second, we compared to the non-robust optimal mechanism of [3]. This mechanism is obtained in a manner similar to PolyOpt, and is the optimal mechanism that satisfies (in our notation) (ε,{P^})-RLDP. In other words, it is optimal in the scenario where one knows P* precisely. We shall refer to this mechanism as NR (non-robust). Typically, we would expect NR to have a higher utility than our RLDP mechanisms, (because it only needs to satisfy privacy with respect to. one distribution) and GRR to have worse a utility (because LDP is stricter than RLDP).

8.1. Adult Data Set

We performed numerical experiments on the adult data set (n = 32,561) [60], which contains demographic data from the 1994 US census. Some examples, where we used different categorical attributes from the data set as S and U, are depicted in Figure 2. We omitted PolyOpt from the larger two experiments, as the space complexity became unfeasible: for occupation vs. education, the polyhedron Γ^ was 240-dimensional and was defined by 57,840 inequality constraints; to find its set of vertices Matlab needed to operate on a 57,840 × 57,840 matrix, whose size (24.7 GB) exceeded Matlab’s maximum array size.

Figure 2.

Figure 2

Experiments on the categories sex, race, education, occupation, relationship and native-country of the adult data set. Numbers between brackets indicate a1 and a2 (Inline graphic SRR, Inline graphic PolyOpt, Inline graphic IR, Inline graphic GRR, Inline graphic NR). (a) S= sex (2), U= race (5), (b) S= race (5), U= sex (2), (c) S= occ. (15), U= edu. (16), (d) S= native country (42), U= relationship (6).

We can see that PolyOpt clearly outperformed IR and SRR in the first two experiments, especially in the high-privacy regime (low ε). Similarly, IR outperformed SRR in the high-privacy regime, but was slightly overtaken for high ε. This is interesting, since SRR satisfies a stronger privacy guarantee, as it provides privacy for all adversary assumptions, so we expected it to offer less utility than IR. An explanation for this is that IR is forced to transmit S and U separately, and so it can be less efficient than SRR, which does not have this restriction. At any rate, the difference between IR and SRR in the low-privacy regime was only marginal compared to the advantage of PolyOpt over both. In the second two experiments, where PolyOpt was infeasible, we can see that IR clearly outperformed SRR. Overall, we see that, especially in the low-privacy regime, PolyOpt was the preferable RLDP mechanism, followed by IR and SRR. Furthermore, we can see that, in all experiments, GRR performed the worst, and the best RLDP mechanism significantly outperformed GRR. This shows that adopting RLDP as a privacy metric results in significantly better utility over LDP. Conversely, NR outperfored the RLDP methods, although the difference between NR and PolyOpt was marginal for higher ε. As for PolyOpt, NR was computationally out of reach for larger |X|.

8.2. Synthetic Data

To study the robustness of our method with respect to utility (Section 8.4) and privacy (Section 8.3), we also needed experiments in which P* was known. For this, we considered experiments on synthetic data. For this, we first randomly created a probability distribution P* on X, where X was the same as in the experiments on the adult data set. The distribution P* was drawn from the Jeffreys prior on PX, i.e., the symmetric Dirichlet distribution with parameter 12. From P*, we then drew n=32,561 elements of X, which we used to obtain the estimate P^; this estimate was then used to create the privacy mechanisms. We carried this out 100 times, and we averaged the NMI of these 100 distributions. The results are shown in Figure 3. The results were similar to those of the experiments of the adult data set: PolyOpt outperformed IR, which outperformed SRR, for small |X| SRR could overtake IR in the low-privacy regime. Furthermore, GRR was the worst overall, while NR was the best overall, but only by a small margin.

Figure 3.

Figure 3

Synthetic experiments with n=32561 and β=0.05 (Inline graphic SRR, Inline graphic PolyOpt, Inline graphic IR, Inline graphic GRR, Inline graphic NR). (a) a1=2,a2=5, (b) a1=5,a2=2, (c) a1=15,a2=16, (d) a1=42, a2=6.

8.3. Realized Privacy Parameter

In the previous subsections, we saw that NR had a (marginally) better utility than PolyOpt. However, this is not a completely fair comparison, since NR was only designed to give privacy for XP^ and might result in a larger privacy leakage for XP*. For the synthetic data, P* was known, and we could measure the true privacy leakage. For a protocol Q, we defined the realized privacy parameter ε* as

ε*=maxyY,s1,s2SPXP*(Y=yS=s1)PXP*(Y=yS=s2)=maxyY,s1,s2SuQy|s1,uPu|s1*uQy|s2,uPu|s2*.

Note that this becomes when there exist s,y such that PXP*(Y=yS=s)=0. We compared ε* for NR and PolyOpt: the results are shown in Figure 4, where we give the 25% and 75% quantiles for both protocols, out of 100 considered distributions. As one can see, NR’s ε* was consistently greater than ε, while PolyOpt’s ε* was consistently lesser. This is what we expected, as NR does not give privacy guarantees for P*, but PolyOpt does when P*F, which happens with 95% probability. Note that the privacy leakage was especially bad for low ε: at ε=0.075, the lowest value of ε we tested, the 75%-quantile of ε* of NR was 0.3897, which is more than 5 times the desired privacy parameter. Overall, we can conclude that NR gave marginally better utility, but this came at quite a privacy cost.

Figure 4.

Figure 4

Realized privacy parameter ε* on synthetic data. Shaded area is bounded by the 25% and 75% quantiles (Inline graphic Polyopt, Inline graphic NR). The green line depicts ε=ε*. (a) a1=2,a2=5, (b) a1=5,a2=2.

8.4. Utility Robustness

For the synthetic data sets (where we knew P*), we also investigated the normalized difference in mutual information IP^(X;Y)IP*(X;Y)IP^(X;Y), to see to what extent we could use IP^(X;Y) as a utility metric in lieu of the true utility IP*(X;Y). This is shown for the three methods in Figure 5, at ε=1.5. Overall, we can see that the difference was quite minor: for all three methods, the difference in NMI, even at its most extreme, was less than 3% of the NMI value. Furthermore, the differences were very symmetric, with the difference being positive and negative approximately equally often. We can conclude that we were justified in using IP^(X;Y) as a utility metric in the other experiments.

Figure 5.

Figure 5

Normalized difference in NMI for P^ and P* on synthetic data (ε=1.5), measured over 100 runs. Box denotes 25–75%-quantiles, whiskers denote minima and maxima. S = SRR, P = PolyOpt, I = IR.

8.5. Impact of β

We also considered the impact of β on utility for synthetic data (fixing ε=1.5). The results are shown in Table 2, which are averages over 100 runs. Note that SRR does not depend on β, since it assumes F=PX. Interestingly, we can see that the impact of β was quite limited; changing β by a factor 100 had at most about 4% impact on NMI. This impact was less for PolyOpt than for IR, and less for larger X. Overall, we can conclude that by choosing β closer to 0, we can significantly increase the robustness of privacy without making a considerable impact on utility.

Table 2.

NMI for synthetic data for various values of β (ε=1.5).

a1=2,a2=5 a1=5,a2=2
β 0.1 0.01 0.001 0.1 0.01 0.001
SRR 0.231 0.231 0.231 0.126 0.126 0.126
PolyOpt 0.727 0.723 0.719 0.374 0.372 0.370
IR 0.512 0.501 0.492 0.169 0.165 0.162
a1=15,a2=16 a1=42,a2=6
β 0.1 0.01 0.001 0.1 0.01 0.001
SRR 0.009 0.009 0.009 0.005 0.005 0.005
IR 0.055 0.053 0.051 0.052 0.052 0.052

9. Conclusions and Future Work

In this paper, we presented a number of algorithms that, given a desired privacy level ε, an estimated distribution P^, and a bound on the Rényi divergence Dα(P^||P), return privacy mechanisms that satisfy a differential privacy-like privacy constraint for the part of the data that is considered sensitive, for all distributions P within the divergence bound. The first class of privacy mechanisms, PolyOpt, offers high utility, but is computationally complex, as it relies on vertex enumeration. The second class, SRR, satisfies a stronger privacy requirement and is optimal in the low-privacy regime with reference to this requirement, but as a result has less utility than mechanisms that do not satisfy this stronger privacy requirement. The third class, IR, is a general framework for releasing the sensitive and non-sensitive part of the data independently, and the optimal division of the privacy budget between these can be found via 1-dimensional optimization; thus, the optimal IR mechanism can be found quickly, while still offering decent utility. Furthermore, taking RLDP rather than LDP as a privacy constraint, i.e., protecting only the part of the data that is sensitive, significantly improves utility. In particular, we showed that the utility of PolyOpt is close to the utility of the optimal non-robust privacy mechanism. In other words, asking for robustness in privacy comes at only a small performance penalty in utility. At the same time, we showed that not asking for robustness comes at a substantial privacy cost.

There are various interesting directions for future research to build upon the results in this paper. One direction is to find analytical bounds on the performance gap between PolyOpt and optimal mechanisms, in particular on the gap with reference to either the non-robust optimal mechanism from [3] or with reference to an optimal robust mechanism. Note, however, that for the moment we do not have any results on optimal robust mechanisms. Another direction is to improve the performance of the low-complexity algorithms that have been proposed. For instance, in independent reporting, one could change the underlying LDP mechanism from GRR to an optimal mechanism. Since GRR is only optimal in the high-privacy regime, we expect that there would be room for improvement in the low-privacy regime. A significant challenge is incorporating optimal mechanisms along the lines of [5]; however, these mechanisms depend on P* which is inaccessible in the RLDP framework. Yet another interesting direction would be to incorporate robustness in utility in addition to robustness in privacy. This would require finding a mechanism that maximizes minPFIP(X;Y). The challenge in this is that IP(X;Y) is concave in P, which makes minimizing it over F difficult. Finally, it would be interesting to apply the RLDP framework to other models. In this work, we studied the model where X splits into a sensitive part S and a non-sensitive part U. It would be interesting to also study the more general case where X is correlated with the sensitive data S, or to apply RLDP to the models that are studied in [12].

Appendix A. Proofs

Appendix A.1. Proof of Theorem 1

This follows from the following four lemmas, where the RHS of (24) is denoted F¯U|s:

Lemma A1. 

If α1, then FU|sF¯U|s.

Proof. 

Assume α<1; the case α>1 is handled analogously. Then, we rewrite Dα(P^||P)B as

xP^xαPxα1eB(α1). (A1)

Let C=eB(α1). Then,

(A2)P^sαPsα1uP^u|sαPu|sα1=uP^s,uαPs,uα1(A3)CssuP^s,uαPs,uα1.

For sS{s} and uU, define Ps,u|¬s=Pu,s1Ps and P^s,u|¬s=P^u,s1P^s. Then, (A3) can be written as

P^sαPsα1uP^u|sαPu|sα1C(1P^s)α(1Ps)α1ssuP^s,u|¬sαPs,u|¬sα1. (A4)

Furthermore, P|¬s=(Ps,u|¬s)sS{s},uU and P^|¬s=(P^s,u|¬s)sS{s},uU form probability distributions on (S{s})×U. As such, we have

uP^s,u|¬sαPs,u|¬sα1=e(α1)Dα(P^|¬s||P|¬s)1. (A5)

Applying this to (A4), we obtain

P^sαPsα1uP^u|sαPu|sα1C(1P^s)α(1Ps)α1 (A6)

or

uP^u|sαPu|sα1Psα1P^sαC(1P^s)α(1Ps)α1. (A7)

To find the bound on uP^u|sαPu|sα1, we have to minimize the RHS of this inequality. The only unknown on the right is Ps. We find the minimum value of the right-hand side by differentiating with respect to Ps, for which we obtain

(α1)Psα2(1P^s)αP^sαC(1P^s)α1(1Ps)α. (A8)

Setting this equal to 0, we find Ps=1C1/α(1P^s). Substituting this into (A7), we obtain

uP^u|sαPu|sα1(C1/α(1P^s))αP^sα (A9)

which can be written as

Dα(P^U|s||PU|s)αα1loge(α1)B/α(1P^s)P^s, (A10)

showing that PU|sF¯U|s. Since P was chosen arbitrarily, we can conclude FU|sF¯U|s. □

Lemma A2. 

If α1, then F¯U|sFU|s.

Proof. 

Again we assume α<1. Suppose that RPU satisfies Dα(P^U|s||R)Bs. Let C be as in (A1) and define γ=1C1/α(1P^s); then,

(A11)1α1loguP^u|sαRuα1=Dα(P^U|s||R)(A12)Bs(A13)=αα1loge(α1)B/α(1P^s)P^s(A14)=αα1logC1/αγP^s

which we can express as

uP^u|sαRuα1CγαP^sα. (A15)

Define PPX by

Pu,s=γRu,if s=s,C1/αP^u,sotherwise. (A16)

Then, PU|s=R, and

(A17)u,sP^u,sαPu,sα1=uP^u,sαγα1Ruα1+ussC1/αP^u,s(A18)=P^sαγα1uP^u|sαRuα1+C(α1)/α(1P^s)(A19)γC+C(α1)/α(1P^s)(A20)=C.

As in the proof of Lemma A1, the condition u,sP^u,sαPu,sα1C is equivalent to Dα(P^||P)B. Thus, we can conclude that PF and so R=PU|sFU|s. Since R was chosen arbitrary, this shows F¯U|sFU|s. □

Lemma A3. 

If α=1, then FU|sF¯U|s.

Proof. 

Let PF, and define P|¬s,P^|¬s as in the proof of Lemma A1. Then,

(A21)D1(P^||P)=P^suP^u|slogP^sP^u|sPsPu|s+(1P^s)s,uP^u,s|¬s(1P^s)P^u,s|¬s(1Ps)Pu,s|¬s(A22)=P^sD1(P^U|s||PU|s)+(1P^s)D1(P^|¬s||P|¬s)+P^slogP^sPs+(1P^s)log1P^s1Ps(A23)=P^sD1(P^U|s||PU|s)+(1P^s)D1(P^|¬s||P|¬s)+D1(VP^s||VPs),

where for p[0,1], the random variable Vp is defined to follow a Bernoulli distribution with P(Vp=1)=p. Since D1 is non-negative and D1(P^||P)B, we find

(A24)D1(P^U|s||PU|s)=1P^sD1(P^||P)(1P^s)D1(P^|¬s||P|¬s)D1(VP^s||VPs)(A25)D1(P^||P)P^s(A26)BP^s.

Thus, PU|sF¯U|s; since PF was chosen arbitrary, we can conclude FU|sF¯U|s. □

Lemma A4. 

If α=1, then F¯U|sFU|s.

Proof. 

Let RPU be such that D1(P^U|s||R)BP^s. Define PPX by

Pu,s=P^sRu,if s=s,P^u,sif ss. (A27)

Then PU|s=R. Furthermore, in (A23) one has D1(P^|¬s||P|¬s)=D1(VP^s||VPs)=0, and so D1(P^||P)B. This shows that PF, and so R=PU|sFU|s. Since R was chosen arbitrarily, we can conclude that F¯U|sFU|s. □

Appendix A.2. Proof of Proposition 1

We first prove the following two auxiliary lemmas. We only prove these for α>1; the other cases are handled analogously.

Lemma A5. 

Let xX, and define

(A28)ξ(ρ)=infξ(0,1]:EB(ρ,ξ)0,(A29)ξ+(ρ)=supξ[1,(1ρ)1):EB(ρ,ξ)0,

where EB is as in Proposition 1. Then, minPFPx=P^xξ(P^x) and maxPF=P^xξ+(P^x).

Proof. 

As in the proof of Lemma A1, define C=e(α1)B; thus

F=PPX:xXP^xαPxα1C. (A30)

Furthermore, define a function F by

F(ρ,ξ)=ρξ1α+(1ρ)1ρξ1ρ1αC, (A31)

with F(1,ξ)=ξ1αC the limit as ρ1. Then, EB(ρ,ξ)=1α1logF(ρ,ξ)+e(α1)BB, so F(ρ,ξ)0EB(ρ,ξ)0. Thus, ξ(ρ)=inf{ξ[0,1]:F(ρ,ξ)0} and the analogous statement holds for ξ+(ρ). The P that yield the extremal Px lie on the boundary of F; hence, they either satisfy Px{0,1}, or the equality

xXP^xαPxα1=C. (A32)

In the latter case, the extremal values of Px have to be stationary points of the Lagrangian expression

Px+λxP^xαPxα1C+μxPx1=0. (A33)

Taking derivatives with respect to all Px, we find

(A34)1+(1α)λP^xαPxα+μ=0,(A35)xx:(1α)λP^xαPxα+μ=0.

It follows that Px=((α1)λμ1)1/αP^x=:ξP^x and Px=((α1)λμ)1/αP^x=:ψP^x for all xx, where ξ and ψ do not depend on x or x. We can find ξ,ψR0 by solving the joint set of equations

(A36)C=xP^xαPxα1(A37)=P^xαPxα1+xxP^xαPxα1(A38)=P^xξ1α+(1P^x)ψ1α,(A39)1=xPx(A40)=P^xξ+(1P^x)ψ.

Define ρ=P^x. Then, (A40) implies ψ=1ρξ1ρ, and the condition ψ0 is equivalent to ξρ1. Substituting this into (A38) shows that we find ξ by solving F(ρ,ξ)=0 for ξ(0,(1ρ)1). Since F(ρ,1)=1C<0 and F is strictly convex in ξ, there exists, at most, one solution in (0,1] and, at most, one in [1,(1ρ)1). It follows that (A33) has, at most, two stationary points, which must correspond to the minimal and maximal value of Px. If the solution in (0,1] exists, it is equal to ξ(ρ), and this stationary point of (A33) corresponds to the minimal value of Px, which is then equal to P^xξ(P^x). If the solution in (0,1] does not exist, then the minimal value of Px is not attained on the boundary and is equal to 0, which then is also equal to P^xξ(P^x). Either way, we find

minPFPx=P^xξ(P^x). (A41)

The proof for the maximal value of Px is analogous. □

Lemma A6. 

For X1X define P^X1:=xX1P^x. Then,

supPF||PP^||1=2maxX1X:X1P^X1(ξ+(P^X1)1). (A42)

Proof. 

For a given P, define X1={xX:PxP^x} and X2={xX:Px<P^x}. To find the maximal value of ||PP^||1, we first maximize it for a given partition X1,X2 of X, and then we maximize over all partitions. Note that X1= is impossible, and for X1=X, we have P=P^, which is certainly not optimal. Given X1,X2, one has

||PP^||1=xX1(PxP^x)+xX2(P^xPx). (A43)

As before, the P maximizing this lies either on the boundary of the probability simplex or it satisfies (A32). For the latter case, we have the Lagrangian expression

xX1(PxP^x)+xX2(P^xPx)+λxP^xαPxα1C+μxPx1=0. (A44)

Taking derivatives, we find, analogously to (A34)–(A35), that there exist ξ,ψ such that Px=ξP^x for all xX1 and Px=ψP^x for all xX2. By definition of X1 and X2, we have ξ1 and 0ψ<1. Analogously to (A36)–(A40), these have to satisfy

(A45)P^X1ξ1α+(1P^X1)ψ1α=C,(A46)P^X1ξ+(1P^X1)ψ=1.

From this point onward, this proof is analogous to that of Lemma A5. Let ρ=P^X1. Expressing ψ in terms of ξ and substituting this means that to find ξ we have to solve F(ρ,ξ)=0 for ξ[1,(1ρ)1), where F is as in the proof of Lemma A5. As before, at most, one such solution exists, and when it does, it corresponds to the maximal value of ||PP^||1 (given X1). If it does not exist, then the maximal value of ||PP^||1 is obtained at the boundary where P^X1=1. Either way the maximum is obtained when ξ=ξ+(ρ), which means that

(A47)||PP^||1=xX1(PxP^x)+xX2(P^xPx)(A48)=xX1P^x(ξ+(ρ)1)+xX2P^x11ρξ+(ρ)1ρ(A49)=ρ(ξ+(ρ)1)+(1ρ)11ρξ+(ρ)1ρ(A50)=2ρ(ξ+(ρ)1).

This is the maximal value of ||PP^||1 given X1; we now find the overall maximum by maximizing over all non-empty X1. □

Proof of Proposition 1. 

In Lemmas A5 and A6, take U instead of X, P^U|s instead of P^, and Bs instead of B. Then, by Theorem 1, the role of F is taken by FU|s. Thus, applying Lemmas A5 and A6 gives us Proposition 1 directly. □

Appendix A.3. Proof of Lemma 1 and Proposition 2

As in the proof of Proposition 1, since by Theorem 1 the projected set FU|s is defined by a Rényi divergence as is F, it suffices to prove the analogous statements about F rather than FU|s. Concretely, we prove the following:

Lemma A7. 

Suppose α=2 and define B˜=eB1; let ξ± be as in Lemma A5. Then,

ξ±(ρ)=B˜+2ρ±B˜2+4ρB˜4B˜ρ22ρ(B˜+1). (A51)

Furthermore, the following hold:

  • 1. 

    Let xmin=argminxXP^x. If B˜(1P^xmin)2, then the maximum in (A6) is attained at X1={xmin}.

  • 2. 

    If B˜<(1P^xmin)2 one has supPF||PP^||1B˜.

The formulas here look slightly different from those in Lemma 1 and Proposition 2. We use this form because it makes the proof more convenient: replacing B˜ with eB1 throughout yields exactly the results of Lemma 1 and Proposition 2 for F instead of FU|s.

Proof. 

Consider the function F(ρ,ξ) from (A31) for α=2 and C=B˜+1, i.e.,

F(ρ,ξ)=ρξ+(1ρ)21ρξB˜1. (A52)

Then, F(ρ,ξ)=0 can be rewritten to a quadratic equation in ξ. Its two roots are ξ±(ρ), and with some rewriting they can be expressed as in (31). For points 1 and 2, we note that

2ρ(ξ+(ρ)1)=B˜2B˜ρ+B˜2+4B˜ρ4B˜ρ2B˜+1. (A53)

We can find its extremal values with respect to ρ by taking the derivative and setting it to 0, i.e., by solving

2B˜B˜+1+2B˜4B˜ρ(B˜+1)B˜2+4B˜ρ4B˜ρ2=0, (A54)

which has a single solution ρopt=1B˜2. Since (A53) is concave in ρ, this means that this unique extremal value is a maximum. If B˜(12P^xmin)2, then ρoptP^xmin, and ρ(ξ+(ρ)1) is decreasing in ρ on [P^xmin,1]. Since all possible values of P^X1 lie in this interval, it is optimal to take X1 such that P^X1 is minimized, i.e., X1={xmin}; this proves point 1. For point 2 we have (and also for general B)

(A55)supPF||PP^||1=2maxX1X:X1P^X1(ξ+(P^X1)1)(A56)2ρopt(ξ+(ρopt)1)(A57)=B˜2B˜1B˜2+B˜2+4B˜1B˜24B˜(1B˜)24B˜+1(A58)=B˜.

Appendix A.4. Proof of Theorem 2

Let D=sSDsRX. Thus, an element tD is of the form t=(ts,u)(s,u)X, and for any s, we have (ts,u)uUDs. For s1,s2S, let Bs1,s2RX×X be the matrix given by

B(s,u);(s,u)s1,s2=1,if u=uands=s=s1,eε,if u=uands=s=s2,0,otherwise. (A59)

Then, we can rewrite (41) as

y,s1,s2:maxtD((Bs1,s2)TQy)Tt0. (A60)

Recall that for each s, we have Ds={RPU:RuLu|s}. Since D=sDs, we can write

(A61)D=tRX:s,u:ts,uLu|s,s:uts,u=1(A62)={tRX:Φt+ϕ0,Ψt+ψ=0},

where ΦRX×X, ϕRX, ΨRS×X and ψRS are given, for s,sS and uU, by

(A63)Φ=idX,(A64)ϕs,u=Lu|s,(A65)Ψs;(s,u)=1,if s=s,0,otherwise,(A66)ψs=1.

Combining this with (A60), we find that Q satisfies (ε,F)-RLDP whenever

y,s1,s2:maxtRX:Φt+ϕ0,Ψt+ψ=0((Bs1,s2)TQy)Tt0. (A67)

Now fix y,s1,s2, and consider the linear programming problem that forms the LHS of (A67). From the duality of linear programming, we know

maxtRX:Φt+ϕ0,Ψt+ψ=0((Bs1,s2)TQy)Tt=minzRX,wRS:ΦTz+ΨTw=(Bs1,s2)TQy,z0ϕTz+ψTw. (A68)

We focus on the linear programming problem of the RHS. The terms of this problem are given by

(A69)ΦTz=z,(A70)(ΨTw)s,u=ws,(A71)((Bs1,s2)TQy)s,u=Qy|s1,u,if s=s1,eεQy|s2,u,if s=s2,0,otherwise,(A72)ϕTz=s,uLu|szs,u,(A73)ψTw=sws.

The equation ΦTz+ΨTw=(Bs1,s2)TQy can now be rewritten as

zs,u=Qy|s1,uws1,if s=s1,eεQy|s2,uws2,if s=s2,wsotherwise. (A74)

Thus, the restriction z0 translates to

ws1maxuUQy|s1,u,ws2eεminuUQy|s2,u,ss1,s2:ws0.

Furthermore, the objective function ϕTz+ψTw becomes

s1uLu|sws+uQy|s1,uLu|s1eεuQy|s2,uLu|s2. (A75)

Combining this with (A67) and (A68), we see that a sufficient condition for Q to be (ε,F)-RLDP is if there exists a wRS such that

(A76)s1uLu|sws+uQy|s1,uLu|s1eεuQy|s2,uLu|s20,(A77)ws1maxuUQy|s1,u,(A78)ws2eεminuUQy|s2,u,(A79)ss1,s2:ws0.

Since uLu|s1 for all s, it follows that the left-hand side of (A76) is minimal if each ws attains its maximal value, subject to the constraints (A77)–(A79). Substituting this, we find that the minimum of the left-hand side is equal to

(A80)1uLu|s1maxu1Qy|u1,s1eε1uLu|s2minu2Qy|u2,s2+uQy|s1,uLu|s1eεuQy|s2,uLu|s2(A81)=maxu1,u2U1uLu|s1Qy|u1,s1eε1uLu|s2Qy|u2,s2+uQy|s1,uLu|s1eεuQy|s2,uLu|s2(A82)=maxu1,u2UQy|u1,s1eεQy|u2,s2+uLu|s1(Qy|s1,uQy|s1,u1)eεuLu|s2(Qy|s2,uQy|s2,u2).

This has to be nonpositive for all choices of u1,u2,s1,s2,y; but this is true precisely if QyΓL,ε for all y.

Appendix A.5. Proof of Theorem 3

This is essentially analogous to the proof of Theorem 4 in [5]; the main difference is that the equivalent of Γ^ is a hypercube, for which a vertex enumeration step is not needed. Let Q be a mechanism such that QyΓ for all y; then there exist αyR0, γyΓ^ such that Qy=αyγy. One has

IP^(X;Y)=yμ(Qy)=yαyμ(γy). (A83)

Since Γ^ is the convex hull of V, we can write γy=vλy,vv for suitable constants λy,v. Define θR0V by θv=yλy,vαy. Then,

vθvv=yQy=1X. (A84)

As such, the matrix QRV×X defined by Qv=θvv defines a privacy mechanism Q. One has

(A85)IP^(X;Q(X))=vμ(Qv)(A86)=vθvμ(v)(A87)=yαyvλy,vμ(v)(A88)yαyμvλy,vv(A89)=IP^(X;Q(X)),

where we use the fact that μ is convex. This shows that the Qy of the optimal mechanism satisfying Theorem 2 are all of the form θv·v; hence, (46) yields the optimal mechanism. To see that |Y|a, observe that the polyhedron described in (46) is defined by a equality constraints, and |V| inequality constraints of the form θv0. Hence, any vertex of this polyhedron has at most a nonzero coefficients. Since the optimal mechanism corresponds to such a vertex, and its output space Y corresponds to its nonzero coefficients, we conclude that |Y|a. □

Appendix A.6. Proof of Theorem 4

We follow the proof of Theorem 14 in [5]; however, we first need the following auxiliary lemma.

Lemma A8. 

Let ε>0, and let CR0X be the positive cone defined by

C={CR0X:Cs,ueεCs,uforallssS,uU}. (A90)

Define the sets V1,V2,VR0X by

(A91)V1=vR0X:s s.t.u:vs,u{eε,eε};ss,u:vs,u=1,(A92)V2=vR0X:x:vx{1,eε},|{s:u s.t.vs,u=eε}|2,(A93)V=V1V2.

Then V spans C as a positive cone, i.e.,

C=vVθvv:θR0V. (A94)

Proof. 

For every sS and u,uU, we have

Cs,ueεCs,ue2εCs,u, (A95)

where sS{s} is arbitrary. Thus, in every CC two coefficients can differ by at most a factor eε if they have different s, and at most a factor e2ε if they have the same s. On the extremal rays of C, the inequalities become equalities. By rescaling by a positive scalar, if necessary, we see that C is spanned by vectors of which each coefficient is in the set {eε,1,eε}. In other words, if V={eε,1,eε}XC, then

C=Span(V), (A96)

where Span refers to the span as in (A94). To determine V we consider two situations: either v contains both eε and eε as coefficients, or not.

Suppose v contains eε and eε, say vs,u=eε and vs,u=eε. By (A90), we must have s=s, and by (A95), this means that vs,u=1 for ss and any u. Thus, we define, for any sS, the set

Vs=vRX:ssu:vs,u=1. (A97)

It is straightforward to show that VsC, and by the discussion above any vV containing both eε and eε is in sVs.

Suppose v does not contain both eε and eε, then vV2V3 where

V2={1,eε}X, (A98)
V3={eε,1}X. (A99)

Furthermore, it is easy to see that V2V3V. Thus, we conclude that

V=sSVsV2V3. (A100)

To obtain from V to V, we throw out some vectors that are not needed to span C. We start with Vs. Given s, define the set

Vs=vR0X:u:vs,u{eε,eε};ss,u:vs,u=1. (A101)

It is clear that VsVs; we claim that

Span(Vs)=Span(Vs). (A102)

To see this, let vVsVs, and define v,v+R0X by

vs,u+=eε,if s=sandvs,u=eε,1,if ss,eε,if s=sandvs,u{1,eε}, (A103)
vs,u=eε,if s=sandvs,u{eε,1},1,if ss,eε,if s=sandvs,u=eε. (A104)

In other words, v± takes all s-coefficients of v that are equal to 1 and changes them to e±ε. Then, v+,vVs and

v=1eε+1v++eεeε+1v. (A105)

Thus, vSpan(Vs), proving (A102). We now consider V2 and V3. First note that V3=eεV2, so

Span(V2)=Span(V3). (A106)

We furthermore claim that

SpanV2sSVs=SpanV2sSVs, (A107)

where V2 is as in (A92). Note that clearly V2V2. To see (A107), let vV2V2; this means that there is at most a single (s,u) such that vs,u=eε. If no such (s,u) exists, then v=1X, the constant vector with all ones. This implies that eεvV2, showing that vSpan(V2). Now suppose that there is exactly one (s,u) such that vs,u=eε. Then,

vs,u=eε,if s=sandu=u,1,otherwise. (A108)

But then we can construct v+ as in (A103) and v as in (A104), and again we find

v=1eε+1v++eεeε+1vSpan(Vs). (A109)

This proves (A107). Combining (A102), (A106) and (A107) we obtain

(A110)C=SpanV2V3sSVs(A111)=SpanV2sSVs(A112)=SpanV2V1.

Proof of Theorem 4. 

We follow the proof of Theorem 14 in [5]. For CR0X, define

μ(C)=xPxCxlogCxxPxCx. (A113)

For yY, let Qy=(Qy|x)xRX; then the utility of a mechanism Q:XY is given by IP(X;Y)=yμ(Qy). Furthermore, μ is a sublinear function in the sense of Definition 1 of [5].

We fix an ε>0. Furthermore, let CR0X be as in Lemma A8. Then, a mechanism Q satisfies (ε,PX)-RLDP if and only if each Qy is an element of C. Let V be the spanning set of V of Lemma A8, and let D be the polytope spanned by V. If Q satisfies ε-SLDP, then every column Qy is of the form θy·dy, where dyD and θyR0 are such that yθydy=1X. Analogously to the proof of Theorems 2 and 4 in Section 7 of [5] (or, for that matter, our proof of Theorem 3), one proves that the optimal Q is found by taking b=a, and taking dyV for all d. Since

I(X;Y)=yμ(Qy)=yθyμ(dy) (A114)

we can find the optimal Q by solving the following optimization problem, where mRV is the vector (μ(v))vV, and where ARX×V is the matrix whose v-th column is v:

maximizeθRVm·θsuchthatA·θ=1X,θ0.

From here, we follow Section 9.5 of [5]. The dual to the above problem is

minimizeαRX(1X)·αsuchthatAT·αm,α0.

By duality, we have maxθm·θ=minα(1X)·α. We describe α*0 and θ*0, depending on ε, such that for sufficiently large ε one has AT·α*m, such that m·θ*=(1X)·α* and Aθ*=1X, and such that θ* corresponds to SRR, i.e., for each yY=X there is a v^yV such that SRRyε=θv^y*v^y. Together, this proves that SRR is optimal for ε0.

More concretely, for y=(s,u)X, define v^y by

(v^y)s,u=eε,if (s,u)=(s,u),eε,if s=sanduu,1,if ss. (A115)

Note that v^yV. Furthermore, let θ*RV be given by

θv*=1eε+eε(a21)+aa2,if there is a yX suchthatv=v^y,0,otherwise; (A116)

Then, SRR satisfies SRRyε=θv^y*v^y for all yX, and for each xX one has

(A117)(Aθ*)x=vAx,vθv*(A118)=vvxθv*(A119)=y(v^y)xeε+eε(a21)+aa2(A120)=1,

which shows that Aθ*=1X. Furthermore, define α*RX by

αs,u*=c1μ(v^s,u)+c2uuμ(v^s,u)+c3ss,uμ(v^s,u), (A121)

where

(A122)c1=(a22)(a21)+(aa2+1)(a22)eε+(a2a2+1)e2ε+e3ε(eε1)(eε+1)(eεa2+1)(eε+(a21)eε+aa2),(A123)c2=a21+(aa2+1)eε(eε1)(eε+1)(eεa2+1)(eε+(a21)eε+aa2),(A124)c3=e2ε(eε1)(eεa2+1)(eε+(a21)eε+aa2).

A cumbersome but straightforward calculation shows that for all x, we have

(A125)m·θ*=(1X)·α*=1eε+eε(a21)+aa2xμ(v^x),(A126)v^x·α*=mv^x=μ(v^x).

Furthermore, c1,c2,c30, so α*0. It remains to be shown that α* satisfies the dual problem for ε0, i.e., ATαm for sufficiently large ε. To this end, for vV, set

Fv={xX:vx=eε}, (A127)
Gv={xX:vx=1}, (A128)
Hv={xX:vx=eε}, (A129)

From the description of V in Lemma A8, we find that |Fv|1 for all v, and |Fv|=1 if and only if there exist s,u such that v=v^s,u. Now, write PFv=xFvPx and likewise for Gv, Hv. For large ε, we have

(A130)mv=μ(v)=eεxFvPxlog1PFv+eεPGv+e2εPHv(A131)+xGvPxlog1eεPFv+PGv+eεPHv(A132)+eεxHxPxlog1e2εPFv+eεPGv+PHv(A133)=PFvlogPFveε+O(ε)

and furthermore

(A134)c1=eε+O(e2ε),(A135)c2,c3=O(e2ε).

From this, it follows that

(A136)αx*=c1μ(v^x)+(c2+c3)O(eε)(A137)=PxlogPx+O(εeε),

Hence,

vTα*=xFvPxlogPxeε+O(ε). (A138)

For |Fv|2, one has PFvlogPFv>xFvPxlogPx. This means that if v is not of the form v^x, one has vTα*mv for sufficiently large ε. Together with (A126), this shows that ATα*m for sufficiently large ε; this concludes the proof. □

Author Contributions

Conceptualization, M.L.-Z. and J.G.; Formal analysis, M.L.-Z. and J.G.; Investigation, M.L.-Z. and J.G.; Methodology, M.L.-Z. and J.G.; Software, M.L.-Z.; Writing, M.L.-Z. and J.G. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding Statement

This research was partially funded by the Netherlands Organisation for Scientific Research (NWO) grant 628.001.026, ERC Consolidator grant 864075 CAESAR and the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 101008233.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Kasiviswanathan S.P., Lee H.K., Nissim K., Raskhodnikova S., Smith A. What can we learn privately? SIAM J. Comput. 2011;40:793–826. doi: 10.1137/090756090. [DOI] [Google Scholar]
  • 2.Duchi J.C., Jordan M.I., Wainwright M.J. Local privacy and statistical minimax rates; Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS); Berkeley, CA, USA. 26–29 October 2013; pp. 429–438. [Google Scholar]
  • 3.Lopuhaä-Zwakenberg M., Tong H., Škorić B. Data Sanitisation for the Privacy Funnel with Differential Privacy Guarantees. Int. J. Adv. Secur. 2020;13:162–174. [Google Scholar]
  • 4.Rebollo-Monedero D., Forne J., Domingo-Ferrer J. From t-closeness-like privacy to postrandomization via information theory. IEEE Trans. Knowl. Data Eng. 2010;22:1623–1636. doi: 10.1109/TKDE.2009.190. [DOI] [Google Scholar]
  • 5.Kairouz P., Oh S., Viswanath P. Extremal mechanisms for local differential privacy. J. Mach. Learn. Res. 2016;17:492–542. [Google Scholar]
  • 6.Makhdoumi A., Salamatian S., Fawaz N., Médard M. From the information bottleneck to the privacy funnel; Proceedings of the 2014 IEEE Information Theory Workshop (ITW 2014); Hobart, TAS, Australia. 2–5 November 2014; pp. 501–505. [Google Scholar]
  • 7.Salamatian S., Zhang A., du Pin Calmon F., Bhamidipati S., Fawaz N., Kveton B., Oliveira P., Taft N. Managing your private and public data: Bringing down inference attacks against your privacy. IEEE J. Sel. Top. Signal Process. 2015;9:1240–1255. doi: 10.1109/JSTSP.2015.2442227. [DOI] [Google Scholar]
  • 8.Asoodeh S., Diaz M., Alajaji F., Linder T. Information extraction under privacy constraints. Information. 2016;7:15. doi: 10.3390/info7010015. [DOI] [Google Scholar]
  • 9.Kung S. A compressive privacy approach to generalized information bottleneck and privacy funnel problems. J. Frankl. Inst. 2018;355:1846–1872. doi: 10.1016/j.jfranklin.2017.07.002. [DOI] [Google Scholar]
  • 10.Ding N., Sadeghi P. A submodularity-based clustering algorithm for the information bottleneck and privacy funnel; Proceedings of the 2019 IEEE Information Theory Workshop (ITW); Visby, Sweden. 25–28 August 2019; pp. 1–5. [Google Scholar]
  • 11.Salamatian S., Calmon F.P., Fawaz N., Makhdoumi A., Médard M. Privacy-Utility Tradeoff and Privacy Funnel. 2020. [(accessed on 10 January 2024)]. Available online: https://api.semanticscholar.org/CorpusID:210927663.
  • 12.Acharya J., Bonawitz K., Kairouz P., Ramage D., Sun Z. Context aware local differential privacy; Proceedings of the International Conference on Machine Learning, PMLR; Virtual. 13–18 July 2020; pp. 52–62. [Google Scholar]
  • 13.Goseling J., Lopuhaä-Zwakenberg M. Robust optimization for local differential privacy; Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT); Espoo, Finland. 26 June–1 July 2022; pp. 1629–1634. [Google Scholar]
  • 14.Lopuhaä-Zwakenberg M., Goseling J. Robust Local Differential Privacy; Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT); Melbourne, VIC, Australia. 12–20 July 2021; pp. 557–562. [Google Scholar]
  • 15.Kifer D., Machanavajjhala A. Pufferfish: A framework for mathematical privacy definitions. ACM Trans. Database Syst. 2014;39:1–36. doi: 10.1145/2514689. [DOI] [Google Scholar]
  • 16.Ben-Tal A., El Ghaoui L., Nemirovski A. Robust Optimization. Volume 28 Princeton University Press; Princeton, NJ, USA: 2009. [Google Scholar]
  • 17.Ben-Tal A., Den Hertog D., Vial J.P. Deriving robust counterparts of nonlinear uncertain inequalities. Math. Program. 2015;149:265–299. doi: 10.1007/s10107-014-0750-8. [DOI] [Google Scholar]
  • 18.Bertsimas D., Gupta V., Kallus N. Data-driven robust optimization. Math. Program. 2018;167:235–292. doi: 10.1007/s10107-017-1125-8. [DOI] [Google Scholar]
  • 19.Warner S.L. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965;60:63–69. doi: 10.1080/01621459.1965.10480775. [DOI] [PubMed] [Google Scholar]
  • 20.Song S., Wang Y., Chaudhuri K. Pufferfish privacy mechanisms for correlated data; Proceedings of the 2017 ACM International Conference on Management of Data; Chicago, IL, USA. 14–19 May 2017; pp. 1291–1306. [Google Scholar]
  • 21.Nuradha T., Goldfeld Z. Pufferfish Privacy: An Information-Theoretic Study. IEEE Trans. Inf. Theory. 2023;69:7336–7356. doi: 10.1109/TIT.2023.3296288. [DOI] [Google Scholar]
  • 22.Yang B., Sato I., Nakagawa H. Bayesian differential privacy on correlated data; Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data; Melbourne, VIC, Australia. 31 May–4 June 2015; pp. 747–762. [Google Scholar]
  • 23.He X., Machanavajjhala A., Ding B. Blowfish privacy: Tuning privacy-utility trade-offs using policies; Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data; Snowbird, UT, USA. 22–27 June 2014; pp. 1447–1458. [Google Scholar]
  • 24.Dwork C., Roth A. The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 2014;9:211–407. doi: 10.1561/0400000042. [DOI] [Google Scholar]
  • 25.Wang T., Blocki J., Li N., Jha S. Locally differentially private protocols for frequency estimation; Proceedings of the 26th {USENIX} Security Symposium ({USENIX} Security 17); Vancouver, BC, Canada. 16–18 August 2017; pp. 729–745. [Google Scholar]
  • 26.Tishby N., Pereira F.C., Bialek W. The information bottleneck method. arXiv. 2000physics/0004057 [Google Scholar]
  • 27.Wagner I., Eckhoff D. Technical privacy metrics: A systematic survey. ACM Comput. Surv. 2018;51:1–38. doi: 10.1145/3168389. [DOI] [Google Scholar]
  • 28.Rassouli B., Gunduz D. On perfect privacy; Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT); Vail, CO, USA. 17–22 June 2018; pp. 2551–2555. [Google Scholar]
  • 29.Asoodeh S., Diaz M., Alajaji F., Linder T. Estimation efficiency under privacy constraints. IEEE Trans. Inf. Theory. 2018;65:1512–1534. doi: 10.1109/TIT.2018.2865558. [DOI] [Google Scholar]
  • 30.Wang H., Calmon F.P. An estimation-theoretic view of privacy; Proceedings of the 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton); Monticello, IL, USA. 3–6 October 2017; pp. 886–893. [Google Scholar]
  • 31.Mironov I. Rényi differential privacy; Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium (CSF); Barbara, CA, USA. 21–25 August 2017; pp. 263–275. [Google Scholar]
  • 32.Issa I., Wagner A.B., Kamath S. An operational approach to information leakage. IEEE Trans. Inf. Theory. 2019;66:1625–1657. doi: 10.1109/TIT.2019.2962804. [DOI] [Google Scholar]
  • 33.Liao J., Kosut O., Sankar L., du Pin Calmon F. Tunable Measures for Information Leakage and Applications to Privacy-Utility Tradeoffs. IEEE Trans. Inf. Theory. 2019;65:8043–8066. doi: 10.1109/TIT.2019.2935768. [DOI] [Google Scholar]
  • 34.Saeidian S., Cervia G., Oechtering T.J., Skoglund M. Pointwise maximal leakage. IEEE Trans. Inf. Theory. 2023;69:8054–8080. doi: 10.1109/TIT.2023.3304378. [DOI] [Google Scholar]
  • 35.Diaz M., Wang H., Calmon F.P., Sankar L. On the robustness of information-theoretic privacy measures and mechanisms. IEEE Trans. Inf. Theory. 2019;66:1949–1978. doi: 10.1109/TIT.2019.2939472. [DOI] [Google Scholar]
  • 36.Makhdoumi A., Fawaz N. Privacy-utility tradeoff under statistical uncertainty; Proceedings of the 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton); Monticello, IL, USA. 2–4 October 2013; pp. 1627–1634. [Google Scholar]
  • 37.Kalantari K., Sankar L., Sarwate A.D. Robust privacy-utility tradeoffs under differential privacy and hamming distortion. IEEE Trans. Inf. Forensics Secur. 2018;13:2816–2830. doi: 10.1109/TIFS.2018.2831619. [DOI] [Google Scholar]
  • 38.Asoodeh S., Alajaji F., Linder T. On maximal correlation, mutual information and data privacy; Proceedings of the 2015 IEEE 14th Canadian Workshop on Information Theory (CWIT); St. John’s, NL, Canada. 6–9 July 2015; pp. 27–31. [Google Scholar]
  • 39.Pardo L. Statistical Inference Based on Divergence Measures. CRC Press; Boca Raton, FL, USA: 2018. [Google Scholar]
  • 40.Ben-Tal A., Den Hertog D., De Waegenaere A., Melenberg B., Rennen G. Robust solutions of optimization problems affected by uncertain probabilities. Manag. Sci. 2013;59:341–357. doi: 10.1287/mnsc.1120.1641. [DOI] [Google Scholar]
  • 41.Duchi J.C., Glynn P.W., Namkoong H. Statistics of robust optimization: A generalized empirical likelihood approach. Math. Oper. Res. 2021;46:946–969. doi: 10.1287/moor.2020.1085. [DOI] [Google Scholar]
  • 42.Wang Z., Glynn P.W., Ye Y. Likelihood robust optimization for data-driven problems. Comput. Manag. Sci. 2016;13:241–261. doi: 10.1007/s10287-015-0240-3. [DOI] [Google Scholar]
  • 43.Mohajerin Esfahani P., Kuhn D. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Program. 2018;171:115–166. doi: 10.1007/s10107-017-1172-1. [DOI] [Google Scholar]
  • 44.Selvi A., Liu H., Wiesemann W. Differential Privacy via Distributionally Robust Optimization. arXiv. 20232304.12681 [Google Scholar]
  • 45.Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative adversarial networks. Commun. ACM. 2020;63:139–144. doi: 10.1145/3422622. [DOI] [Google Scholar]
  • 46.Huang C., Kairouz P., Chen X., Sankar L., Rajagopal R. Context-aware generative adversarial privacy. Entropy. 2017;19:656. doi: 10.3390/e19120656. [DOI] [Google Scholar]
  • 47.Tripathy A., Wang Y., Ishwar P. Privacy-preserving adversarial networks; Proceedings of the 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton); Monticello, IL, USA. 24–27 September 2019; pp. 495–505. [Google Scholar]
  • 48.Mirjalili V., Raschka S., Namboodiri A., Ross A. Semi-adversarial networks: Convolutional autoencoders for imparting privacy to face images; Proceedings of the 2018 International Conference on Biometrics (ICB); Gold Coast, QLD, Australia. 20–23 February 2018; pp. 82–89. [Google Scholar]
  • 49.Bortolato B., Ivanovska M., Rot P., Križaj J., Terhörst P., Damer N., Peer P., Štruc V. Learning privacy-enhancing face representations through feature disentanglement; Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG); Buenos Aires, Argentina. 16–20 November 2020; pp. 45–52. [Google Scholar]
  • 50.Stoker J.I., Garretsen H., Spreeuwers L.J. The facial appearance of CEOs: Faces signal selection but not performance. PLoS ONE. 2016;11:e0159950. doi: 10.1371/journal.pone.0159950. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Willenborg L., De Waal T. Elements of Statistical Disclosure Control. Volume 155 Springer Science & Business Media; Berlin, Germany: 2012. [Google Scholar]
  • 52.Hundepool A., Domingo-Ferrer J., Franconi L., Giessing S., Nordholt E.S., Spicer K., De Wolf P.P. Statistical Disclosure Control. John Wiley & Sons; Hoboken, NJ, USA: 2012. [Google Scholar]
  • 53.Liese F., Vajda I. Advances in Inequalities from Probability Theory and Statistics. Nova Publishers; New York, NY, USA: 2008. f-divergences: Sufficiency, deficiency and testing of hypotheses; p. 113. [Google Scholar]
  • 54.van Erven T., Harremoës P. Rényi divergence and majorization; Proceedings of the 2010 IEEE International Symposium on Information Theory; Austin, TX, USA. 13–18 June 2010; pp. 1335–1339. [Google Scholar]
  • 55.Csiszár I. Information-type measures of difference of probability distributions and indirect observation. Stud. Sci. Math. Hung. 1967;2:229–318. [Google Scholar]
  • 56.Kullback S. A lower bound for discrimination information in terms of variation (corresp.) IEEE Trans. Inf. Theory. 1967;13:126–127. doi: 10.1109/TIT.1967.1053968. [DOI] [Google Scholar]
  • 57.Gilardoni G.L. On Pinsker’s and Vajda’s type inequalities for Csiszár’s f-divergences. IEEE Trans. Inf. Theory. 2010;56:5377–5386. doi: 10.1109/TIT.2010.2068710. [DOI] [Google Scholar]
  • 58.Toth C.D., O’Rourke J., Goodman J.E. Handbook of Discrete and Computational Geometry. CRC Press; Boca Raton, FL, USA: 2017. [Google Scholar]
  • 59.Dwork C., McSherry F., Nissim K., Smith A. Calibrating noise to sensitivity in private data analysis; Proceedings of the Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006; New York, NY, USA. 4–7 March 2006; pp. 265–284. [Google Scholar]
  • 60.Dua D., Graff C. UCI Machine Learning Repository. 2017. [(accessed on 10 January 2024)]. Available online: https://archive.ics.uci.edu/

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data are contained within the article.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES