Convergence Rates for Differentially Private Statistical Estimation

Kamalika Chaudhuri; Daniel Hsu

. Author manuscript; available in PMC: 2014 Oct 7.

Published in final edited form as: Proc Int Conf Mach Learn. 2012 Jul;2012:1327–1334.

Convergence Rates for Differentially Private Statistical Estimation

Kamalika Chaudhuri ¹, Daniel Hsu ²

PMCID: PMC4188376 NIHMSID: NIHMS491355 PMID: 25302341

Abstract

Differential privacy is a cryptographically-motivated definition of privacy which has gained significant attention over the past few years. Differentially private solutions enforce privacy by adding random noise to a function computed over the data, and the challenge in designing such algorithms is to control the added noise in order to optimize the privacy-accuracy-sample size tradeoff.

This work studies differentially-private statistical estimation, and shows upper and lower bounds on the convergence rates of differentially private approximations to statistical estimators. Our results reveal a formal connection between differential privacy and the notion of Gross Error Sensitivity (GES) in robust statistics, by showing that the convergence rate of any differentially private approximation to an estimator that is accurate over a large class of distributions has to grow with the GES of the estimator. We then provide an upper bound on the convergence rate of a differentially private approximation to an estimator with bounded range and bounded GES. We show that the bounded range condition is necessary if we wish to ensure a strict form of differential privacy.

1. Introduction

Differential privacy (Dwork et al., 2006b) is a strong, cryptographically-motivated definition of privacy which has gained significant attention in the machine-learning and data-mining communities over the past few years (McSherry & Mironov, 2009; Chaudhuri et al., 2011; Friedman & Schuster, 2010; Mohammed et al., 2011). In differentially private solutions, privacy is guaranteed by ensuring that the participation of a single individual in a database does not change the outcome of a private algorithm by much. This is typically achieved by adding some random noise, either to the sensitive input data, or to the output of some function, such as a classifier, computed on the sensitive data. While this guarantees privacy, for most statistical and machine learning tasks, there is a subsequent loss in statistical efficiency, in terms of the number of samples required to estimate a function to a given degree of accuracy. Thus the main challenge in designing differentially private algorithms is to optimize the privacy-accuracy-sample size trade-off, and a body of literature has been devoted to this goal.

In this paper, we focus on differentially-private statistical estimation. We ask: what properties should a statistical estimator have, so that it can be approximated accurately with differential privacy? Privately approximating an estimator based on a functional T that performs well when data is drawn from a specific distribution F is easy: ignore the sensitive data, and output T (F). Thus the challenge is to design differentially private approximations to estimators that are accurate over a wide range of distributions.

Previous work (Smith, 2011) on differentially private statistical estimation shows how to construct differentially private approximations to estimators which have asymptotic normality guarantees under fairly mild conditions. In practical situations, however, we must take into account the effect of a finite number of samples. Moreover, it has been empirically observed (e.g., Chaudhuri et al., 2011; Vu & Slavkovic, 2009) that there is often a significant gap in statistical efficiency between a differentially private estimator and its non-private counterpart. Thus there is a need to study finite sample convergence rates for differentially private statistical estimators, in order to characterize the properties that make a statistical estimator amenable to differentially-private approximations.

In this paper, we provide upper and lower bounds on the finite sample convergence rates of such estimators. Our first finite sample result draws a connection between differentially private statistical estimators and Gross Error Sensitivity, a measure commonly used in the robust statistics literature (Huber, 1981). The Gross Error Sensitivity (GES) of a statistical functional T at a distribution F is the maximum change in the value of T (F) by an arbitrarily small perturbation of F by any point mass x in the domain. We provide a lower bound on the convergence rate of any differentially private statistical estimator, showing that an estimator that approximates T (F_n) well with differential privacy over a large class of distributions must have its convergence rate grow with the GES of T.

A natural question to ask next is whether bounded GES is sufficient for the existence of differentially private estimators that are accurate for large classes of distributions. We next show that at least for α-differential privacy, this is not the case. Any estimator based on a functional T that takes values in a range of length R and guarantees α-differential privacy for a wide class of distributions, has to have a finite sample convergence rate that grows with increasing R.

We then show that bounded range and GES are indeed sufficient for differentially private estimation. In particular, given an estimator based on a functional T which takes values in a bounded range, and has bounded GES for all distributions close to the underlying data distribution F, we show how to compute a differentially private approximation to T (F) based on sensitive data drawn from F. Our approximation preserves (α, δ)-differential privacy, a relaxation of α-differential privacy, and is based on the smoothed sensitivity method (Nissim et al., 2007). We provide a finite sample upper bound on the convergence rate of this estimator.

The statistical estimators in our upper bounds are computationally inefficient in general. We conclude by providing a separate explicit method for privately approximating M-estimators with certain properties. We prove that these differentially-private estimators enjoy similar privacy and statistical guarantees as those based on the smooth-sensitivity method, while being more efficiently computable.

Related Work

Differential privacy was proposed by (Dwork et al., 2006b), and has been used since in many works on privacy (e.g., Blum et al., 2005; Barak et al., 2007; Nissim et al., 2007; McSherry & Mironov, 2009; Chaudhuri et al., 2011). It has been shown to have strong semantic guarantees (Dwork et al., 2006b) and is resistant to many attacks (Ganta et al., 2008) that succeed against some other definitions of privacy.

Dwork & Lei (2009) is the first work to identify a connection between differential privacy and robust statistics; based on robust statistical estimators as a starting point, they provide differentially private algorithms for several common estimation tasks, including interquartile range, trimmed mean and median, and regression.

In further work, Smith (2011) shows how to construct a differentially private approximation Inline graphic to certain types of statistical estimators T, and establishes asymptotic normality of his estimator provided certain conditions on T hold. We in contrast focus on finite sample bounds, with an aim towards characterizing the statistical properties of estimators that determine how closely they can be approximated with differential privacy. Lei (2011) considers M-estimation, and provides a simple and elegant differentially-private M-estimator which is statistically consistent.

Finally, work on the sample requirement of differentially private algorithms include bounds on the accuracy of differentially private data release (Hardt & Talwar, 2010), and the sample complexity of differentially private classification (Chaudhuri & Hsu, 2011).

2. Preliminaries

The goal of this paper is to examine the conditions under which we can find private approximations to estimators. The notion of privacy we use is differential privacy (Dwork et al., 2006b;a).

Definition 1

A (randomized) algorithm Inline graphic taking values in a range is (α, δ)-differentially private if for all S ⊆ , and all data sets D and D′ differing in a single entry,

{Pr}_{A} [A (D) \in S] \leq e^{α} {Pr}_{A} [A (D^{'}) \in S] + δ,

where Pr[·] is the distribution on Inline graphic induced by the output of given a data set.

A (randomized) algorithm Inline graphic is α-differentially private if it is (α, 0)-differentially private.

Here α > 0 and δ ∈ [0, 1] are privacy parameters, where smaller α and δ imply stricter privacy.

A general approach to developing differentially private approximations to functions is to add noise, either to the sensitive data, or to the output of a non-private function computed on the data. This work explores what properties statistical functionals need to have so that they can be accurately approximated with differential privacy.

Let Inline graphic denote the space of probability distributions on a domain . A statistical functional T: → ℝ is a real-valued function of a distribution F. The plug-in estimator of θ = T (F) is given by θ_n:= T (F_n), where F_n is the empirical distribution corresponding to an i.i.d. sample of size n drawn from F.

A common measure of the robustness of a statistical functional is the influence function, which measures how a functional T (F) responds to small changes to the input F.

Definition 2

The influence function IF(x, T, F) for a functional T and distribution F at x ∈ Inline graphic is:

IF (x, T, F) = lim_{ρ \to 0} \frac{T ((1 - ρ) F + ρ δ_{x}) - T (F)}{ρ}

where δ_x denotes the point mass distribution at x.

It is a well-established result in theoretical statistics (see, e.g, Wasserman, 2006) that if T is Hadamard-differentiable, and if Inline graphic [IF(x, T, F)2] is bounded, then T (F_n) converges to T (F) as n → ∞.

A related notion is that of gross error sensitivity, which measures the worst-case value of the influence function for any x ∈ Inline graphic .

Definition 3

The gross error sensitivity GES(T, F) for a functional T and distribution F is:

GES (T, F) = sup_{x \in X} ∣ IF (x, T, F) ∣ .

We also define the notions of influence function and gross error sensitivity at a fixed scale ρ > 0:

\begin{array}{l} {IF}_{ρ} (x, T, F) : = \frac{T ((1 - ρ) F + ρ δ_{x}) - T (F)}{ρ} \\ {GES}_{ρ} (T, F) : = sup_{x \in X} ∣ {IF}_{ρ} (x, T, F) ∣ . \end{array}

In this work, the data domain Inline graphic will be a subset of ℝ. We overload notation and use F to denote a distribution as well as its cumulative distribution function. For two distributions F and G, we use d_GC(F, G): = sup_x_∈ℝ |F (x) − G(x)| to denote the Glivenko-Cantelli distance between F and G. For a distribution F from a family Inline graphic and a radius r > 0, let (F, r) denote the set of distributions G ∈ such that d_GC(F, G) ≤ r. Finally, we use d_TV(F, G) to denote the total variantion distance between F and G.

A statistical functional T is B-robust at F if GES(T, F) is finite. B-robustness has been studied in the robust statistics literature (Hampel et al., 1986; Huber, 1981), and plug-in estimators for B-robust functionals are considered to be resistant to outliers and changes in the input.

3. Lower Bounds

We begin by establishing lower bounds on the convergence rate of any differentially private approximation to a statistical functional T (F).

3.1. Lower Bounds based on Gross Error Sensitivity

We first show a lower bound on the error of any (α, δ)-differentially private approximation to T in terms of the gross error sensitivity of T at a distribution F.

Theorem 1

Pick any $α \in (0, \frac{ln 2}{2})$ and $δ \in (0, \frac{α}{23})$ . Let Inline graphic be the family of all distributions over , and let be any (α, δ)-differentially private algorithm. For all n ∈ ℕ and all F ∈ , there exists a radius $ρ = ρ (n) = \frac{1}{n} \cdot [\frac{ln 2}{2 α}]$ and a distribution G ∈ with d_TV(F, G) = ≤ ρ, such that either

\begin{array}{l} E_{F_{n} ~ F} E_{A} [∣ A (F_{n}) - T (F) ∣] \geq \frac{ρ}{16} {GES}_{ρ} (T, F), o r \\ E_{G_{n} ~ G} E_{A} [∣ A (G_{n}) - T (G) ∣] \geq \frac{ρ}{16} {GES}_{ρ} (T, F) . \end{array}

Several remarks are in order. First of all, the form of Theorem 1 is slightly unconventional in the sense that applies not to particular distributions, but to a set of distributions. In particular, the bound states that either the convergence rate of F is high, or the convergence rate of some G close to F is high. Observe that for a fixed distribution F, it is trivial to construct a differentially private approximation to T (F) that is accurate for F – ignore any sensitive input data, and simply output T (F). This algorithm provides a perfectly accurate estimate when the input is drawn from F, but performs poorly otherwise; thus any lower bound that applies to all differentially private algorithms will have a similar form. On the other hand, the differentially private estimators in Theorem 1 have few restrictions: they are only expected to be accurate for distributions lying in a small neighborhood of F, and may be extremely inaccurate in general.

Second, for fixed n, ρ is a function $ρ (n) = \frac{1}{n} \cdot ⌈ \frac{ln 2}{2 α} ⌉$ , which decreases to zero as n → ∞; provided GES_ρ(T, F) remains the same as ρ diminishes, the lower bound grows weaker with increasing n. The lower bound thus does not rule out the existence of consistent private estimators.

Finally, we observe from the proof of Theorem 1 that Inline graphic need not be the family of all distributions over ; the theorem will still apply if for every F ∈ , and for all x ∈ , (1 − ρ)F + ρδ_x also lies in the family ; for example if is the set of all discrete distributions over .

While Theorem 1 is very general, we present below an example that illustrates an implication of the theorem.

Example 1

Let Inline graphic = [0, a], and let be the set of all discrete distributions over . Let T (F) be the mean of F.

Cosnider a fixed F ∈ Inline graphic , and a fixed n. Let ρ = ρ(n) as in Theorem 1. For any F, ${GES}_{ρ} (T, F) \geq \frac{a}{2}$ . It can be shown that for any G ∈ (F, ρ(n)), Var [G] ≤ Var [F] + ρ(1 − ρ)a². Thus, the expected errors of the (non-private) plug-in estimators are bounded as $E [∣ T (F_{n}) - T (F) ∣] \leq O (\sqrt{Var [F] / n})$ and $E [∣ T (G_{n}) - T (G) ∣] \leq O (\sqrt{Var [F] / n} + \sqrt{ρ (1 - ρ) a^{2} / n})$ for all G ∈ Inline graphic (F, ρ(n)). On the other hand, Theorem 1 shows that for every differentially private estimator , at least one of [| (F_n) − T (F)|] and [| (G_n) − T (G)|] is Ω(ρa); this quantity is higher than the corresponding quantity for the non-private estimator so long as $n \leq O (\frac{a^{2}}{Var [F] α^{2}})$ .

Proof of Theorem 1

Let x^* be the x ∈ Inline graphic that maximizes |IF_ρ(x, T, F)|. Let γ > 0, and let $ρ : = \frac{1}{n} ⌈ \frac{ln 2}{2 α} ⌉$ , and let G:= (1 − ρ)F + ρδ_x^*. Observe that d_TV(F, G) ≤ ρ and IF_ρ(x*, T, F) = (T (G) − T (F))/ρ.

Consider the following procedure for drawing n samples from G. First, draw a random sample F_n of size n from F (we overload the notation F_n to refer to both a random sample and its empirical distribution). Next, for each i = 1, 2, …, n, independently toss a biased coin with heads probability ρ; if the coin turns up heads, replace the i-th element of F_n by x*; otherwise, do nothing. This procedure constructs a random sample G_n of size n from G, and in the process constructs a coupling between samples of size n from F and G. In what follows, we will use this coupling to calculate the quantity

E_{F_{n} ~ F} E_{A} [∣ A (F_{n}) - T (F) ∣] + E_{G_{n} ~ G} E_{A} [∣ A (G_{n}) - T (G) ∣] .

Let F_n be any randomly drawn sample of size n from F, and let G_n be a corresponding sample from G as drawn from the coupling procedure. Call a pair (F_n, G_n) ρ-close if they differ in at most n entries. As the median of Binomial(n, ρ) is ≤ ⌈ρn⌉ = ρn, the probability that at most ρn of the elements of F_n are converted to x^* by the coupling process is at least 1/2.

In other words,

{Pr}_{G_{n}} [(F_{n}, G_{n}) is ρ - close] \geq 1 / 2.

(1)

For any ρ-close pair (F_n, G_n), we can apply Lemma 3¹ with the parameters t:= T (F), t′:= T (G), γ:= 1/4, and

Δ : = ρ n \leq (1 + \frac{ln 2}{2 α}) \leq \frac{ln 2}{α} = \frac{ln \frac{1}{2 γ}}{α};

the lemma implies, for any ρ-close pair (F_n, G_n),

E_{A} [∣ A (F_{n}) - T (F) ∣] + E_{A} [∣ A (G_{n}) - T (G) ∣] \geq \frac{1}{4} ∣ T (F) - T (G) ∣ .

Therefore, conditioned on F_n, we have

E_{A} [∣ A (F_{n}) - T (F) ∣] + E_{G_{n}} E_{A} [∣ A (G_{n}) - T (G) ∣ ∣ F_{n}] \geq \frac{1}{8} ∣ T (F) - T (G) ∣

by (1). Taking a final expectation over F_n ~ F,

\begin{array}{l} E_{F_{n} ~ F} E_{A} [∣ A (F_{n}) - T (F) ∣] + E_{G_{n} ~ G} E_{A} [∣ A (G_{n}) - T (G) ∣] \geq \frac{1}{8} ∣ T (F) - T (G) ∣ \\ = \frac{ρ}{8} ∣ {IF}_{ρ} (x^{*}, T, F) ∣ = \frac{ρ}{8} {GES}_{ρ} (T, F) . \end{array}

The theorem follows.

3.2. Lower Bounds as a Function of Range

Is the bound in Theorem 1 tight? In other words, if T has bounded GES, can we compute accurate differentially private approximations to T (F) for all distributions F over a domain? We next show that at least for (α, 0)-differential privacy, Theorem 1 is not tight; if we wish to compute differentially private and accurate estimates of T (F) for all distributons F in a family, where T (F) can take any value in a range [λ, λ′], then the sample size must grow as a function of λ′ − λ.

Theorem 2

Let Inline graphic be a family of distributions over , and let be any (α, 0)-differentially private algorithm. Suppose for all τ ∈ [λ, λ′], there exists some F^τ ∈ such that T(F^τ) = τ. Then there exists some F ∈ such that

E_{F_{n} ~ F, A} [∣ A (F_{n}) - T (F) ∣] \geq \frac{1}{4} \cdot \frac{λ^{'} - λ}{2 + e^{α n}} .

Example 2

For any γ ∈ ℝ, let U_γ be the uniform distribution on [γ − 1, γ + 1], and let Inline graphic be the family = {U_γ: γ ∈ [−R, R]}. Let T (F) be the median of F. For every F ∈ , the non-private estimator T (F_n) converges to T (F) at a rate proportional to $O (\frac{1}{\sqrt{n}})$ , independent of R. However, Theorem 2 shows that for every differentially private estimator Inline graphic , there is some F ∈ such that | (F_n) − T (F)| grows with R.

Proof of Theorem 2

Let $r : = \frac{λ^{'} - λ}{2 + e^{α n}}$ and $Γ : = ⌊ \frac{λ^{'} - λ}{r} ⌋$ . For each i = 1, 2, …, Γ, let Fⁱ be a distribution in Inline graphic such that $T (F^{i}) = λ + (i - \frac{1}{2}) r$ ; such distributions are guaranteed to exist by assumption. Also, for each i = 1, 2, …, Γ, let $F_{n}^{i}$ be an iid sample of size n from Fⁱ, and define the half-open interval Iⁱ:= [λ + (i −1)r, λ + ir). Observe that the intervals I_i are disjoint. To prove the theorem, let us assume the contrary:

E_{F_{n}^{i}, A} [∣ A (F_{n}^{i}) - T (F^{i}) ∣] \leq r / 4 for all i .

(2)

This, along with a Markov’s inequality on $∣ A (F_{n}^{i}) - T (F^{i}) ∣$ , implies that ${Pr}_{F_{n}^{i}, A} [A (F_{n}^{i}) \in I^{i}] \geq 1 / 2$ . Therefore, for any i,

\begin{array}{l} \frac{1}{2} \geq {Pr}_{F_{n}^{i}, A} [A (F_{n}^{i}) \notin I^{i}] \leq \sum_{j \neq i} {Pr}_{F_{n}^{i}, A} [A (F_{n}^{i}) \in I^{j}] \\ \geq e^{- α n} \sum_{j \neq 1} {Pr}_{F_{n}^{j}, A} [A (F_{n}^{j}) \in I^{j}] \geq \frac{1}{2} (Γ - 1) e^{- α n} \end{array}

where the first step follows by assumption, the second step follows because the intervals {I^j} are disjoint, and the third step from Lemma 2 and the fact that for any i and j, any $F_{n}^{i}$ and $F_{n}^{j}$ differ in at most n entries. Rearranging, the inequality becomes Γ ≤ 1 + e^αn, which is a contradiction since Γ = ⌊(λ′ − λ)/r⌋ > 1 + e^αn. Therefore (2) cannot hold, so the theorem follows.

4. Upper Bounds

In this section, we show that bounded GES and bounded range are sufficient conditions for the existence of an (α, δ)-differentially private approximation to T. Our approximation uses the smooth-sensitivity method of Nissim et al. (2007), for which we provide a new statistical analysis in Section 4.1 (Theorem 3). We also provide a specific analysis for the case of linear functionals in Appendix B.

Let d_H(D, D′) denote the Hamming distance between D and D′ (the number of entries in which D and D′ differ), and recall the following definitions from Nissim et al. (2007).

Definition 4

The local sensitivity of a function ϕ: ℝⁿ → ℝ at a data set D ∈ ℝⁿ, denoted by LS(ϕ, D), is

LS (ϕ, D) : = sup {∣ ϕ (D) - ϕ (D^{'}) ∣ : d_{H} (D, D^{'}) = 1} .

For β > 0, the β-smooth sensitivity of ϕ at D, denoted by SS_β(ϕ, D), is

{SS}_{β} (ϕ, D) : = sup {e^{- β d_{H} (D, D^{'})} \cdot LS (ϕ, D^{'}) : D^{'} \in ℝ^{n}} .

Throughout, we assume D ∈ ℝⁿ is an i.i.d. sample of size n drawn from a fixed distribution F, and F_n is the empirical CDF corresponding to this sample. For a statistical functional T, we use the overloaded notation SS_β(T, F_n) to denote the β-smooth sensitivity of T (F_n) at the data set F_n = D.

4.1. Estimator Based on Smooth Sensitivity

For a statistical functional T, let Inline graphic be the randomized estimator given by

A_{T} (F_{n}) : = T (F_{n}) + {SS}_{β (α, δ)} (T, F_{n}) \cdot \frac{2}{α} \cdot Z

(3)

where $β (α, δ) : = \frac{α}{2 ln (1 / δ)}$ and Z is an independent random variable drawn from the standard Laplace density p_Z (z) = 0.5e^−|^z^|. Inline graphic essentially computes T (F_n) and adds zero-mean noise, with the scale determined by the privacy parameters and the smooth sensitivity. Computing SS_β₍_α_, _δ₎(T, F_n) in general can be computationally challenging –see Nissim et al. (2007); our result thus demonstrates an upper bound.

The following guarantee is due to Nissim et al. (2007).

Proposition 1

Inline graphic is (α, δ)-differentially private.

To give a statistical guarantee for Inline graphic , we begin with a standard tail bound based on the simple fact that Pr_Z [|Z| > t] ≤ e⁻^t.

Proposition 2

For any t > 0,

{Pr}_{Z} [∣ A_{T} (F_{n}) - T (F_{n}) ∣ > {SS}_{β (α, δ)} (T, F_{n}) \cdot \frac{2}{α} \cdot t] \leq e^{- t} .

It follows that the convergence rate of Inline graphic depends on the β-smooth sensitivity of T at F_n, which can be bounded under the following conditions on T.

Condition 1 (Bounded range)

There exists a finite R > 0 such that the range of T is contained in an interval of length R.

Condition 2 (Bounded gross error sensitivity)

The sequence (Γ_n) given by

Γ_{n} : = sup {{GES}_{1 / n} (T, G) : G \in B_{GC} (F, \sqrt{\frac{2 ln (2 / η)}{n}})}

is bounded.

Even for non-private estimation, the robustness of an estimator depends not just on the influence functions at the target distribution F, but also on these quantities in a local neighborhood around F (Huber, 1981, p. 72). For convenience, Condition 2 is stated in terms of Glivenko-Cantelli distance, but can be easily changed to any distance under which F_n converges to F as n → ∞ with suitable modifications in the analysis.

We now state our main statistical guarantee for Inline graphic .

Theorem 3

Assume Condition 1 and Condition 2 hold. Pick any η ∈ (0, 1/4). With probability ≥ 1–2η, the estimator Inline graphic from (3) satisfies

∣ A_{T} (F_{n}) - T (F) ∣ \leq ∣ T (F_{n}) - T (F) ∣ + \frac{2 ln (1 / η)}{α} max {\frac{2 Γ_{n}}{n}, R \cdot exp (- \frac{α \sqrt{n ln (2 / η)}}{74 ln (1 / δ)})}

where R is the quantity in Condition 1, and Γ_n is the quantity in Condition 2.

Proof

Follows from Proposition 2, Lemma 1, a union bound, and the triangle inequality.

The first term in the bound, |T (F_n) − T (F)|, is the error of the non-private plug-in estimate T (F_n). If T is Hadamard-differentiable, then T (F_n) − T (F) converges in distribution to a zero-mean normal random variable with variance n⁻¹ ∫ IF(x, T, F)²dF (x); in this case, T (F_n) converges to T (F) at an asymptotic n^−1/2 rate (Wasserman, 2006). Non-asymptotic rates can also be established in terms of other specific properties of T and F (see Appendix B for an example).

The second term in the bound from Theorem 3 is roughly the larger of

A_{1} : = O (\frac{Γ_{n}}{α n}) and A_{2} : = \frac{R}{α} \cdot exp (- Ω (\frac{α \sqrt{n}}{ln (1 / δ)}))

(for constant η), can be compared to the lower bounds from Section 3. The lower bound from Theorem 1 is close to A₁ as long as GES_ρ(T, F) ≈ Γ_n for $ρ = \frac{ln 2}{2 α n}$ . This hold for sufficiently large n when lim_n_→∞ Γ_n = GES(T, F). The lower bound from Theorem 2 decreases as R·exp(−Ω(αn)), which is a little better than A₂, but is otherwise qualitatively similar in terms of its dependence on the range R².

Example 3

If T (F) is the median of F, and Inline graphic := {U_γ: γ ∈ [−R, R]} is the family of uniform distributions on unit length intervals [γ − 1, γ + 1] from Example 2, then Γ_n = 1/2, and the bound in Theorem 3 reduces to

∣ T (F_{n}) - T (F) ∣ + O (\frac{1}{α n}) + \frac{R}{α} \cdot e^{- Ω (α \sqrt{n} / ln (1 / δ))} .

4.2. Bounding the Smooth Sensitivity

The proof of Theorem 3 (see Appendix C) is based on the following lemma, which establishes a high-probability bound on SS_β(T, F_n) under Conditions 1 and 2.

Lemma 1

Assume Condition 1 and Condition 2 hold. With probability ≥ 1 − η,

{SS}_{β} (T, F_{n}) \leq max {\frac{2 Γ_{n}}{n}, R exp (- β (\sqrt{\frac{n ln (2 / η)}{2}} - 1))}

where R is the quantity in Condition 1, and Γ_n is the quantity in Condition 2.

5. Differentially-Private M-Estimation

We now provide a procedure for constructing differentially private approximations to M-estimators that satisfy certain conditions. Unlike our estimators in Section 4.1, these estimators are computationally efficient; however they only apply to a more restricted class of estimators.

5.1. M-Estimators

An M-estimator T_ψ(F_n) is given as the solution θ_n ∈ ℝ to the equation

\int ψ (x, θ_{n}) {d F}_{n} (x) = 0

for some function ψ: ℝ × ℝ → ℝ. For a CDF G and θ ∈ ℝ, define

Ψ (G, θ) : = \int ψ (x, θ) d G (x)

so Ψ (F_n, T_ψ(F_n)) = 0. The derivative of Ψ with respect to its second argument, which is assumed to exist, is denoted by Ψ′. Throughout, we will assume ψ satisfies the following condition.

Condition 3 (Bounded ψ-range and monotonicity)

There exists a finite K > 0 such that the range of ψ is contained in [−K, K], and ψ is non-decreasing in its second argument.

Under this condition, the gross error sensitivity of T_ψ at F can be bounded as

GES (T_{ψ}, F) = \frac{{sup}_{x \in ℝ} ∣ ψ (x, T_{ψ} (F)) ∣}{∣ Ψ^{'} (F, T_{ψ} (F)) ∣} \leq \frac{K}{∣ Ψ^{'} (F, T_{ψ} (F)) ∣} .

(4)

Previous works (Chaudhuri et al., 2011) and (Rubinstein et al., 2009) have provided differentially private and computationally efficient algorithms for M-estimation under assumptions that are very similar to Condition 3. The algorithm in Rubinstein et al. (2009), and one of the algorithms in Chaudhuri et al. (2011) are based on the sensitivity method, while the main algorithm in Chaudhuri et al. (2011) is based on an objective perturbation method. While both algorithms are computationally efficient, both require explicit regularization. This is problematic in practice because determining the regularization parameter privately through differentially-private parameter-tuning requires extra data – for a more detailed discussion of this issue, see Chaudhuri et al. (2011). In contrast, our algorithm is based on the Exponential Mechanism, and does not have an explicit regularization parameter; instead we assume that Ψ′ is smooth, and our guarantees depend on the value of the derivative Ψ′ (F, T_ψ(F)).

5.2. Exponential Mechanism for M-Estimation

Fix a density μ on ℝ, and let Inline graphic be the randomized estimator whose output has probability density

p_{A_{ψ, μ} (F_{n})} (θ) \propto μ (θ) exp (- \frac{n α}{2 K} ∣ Ψ (F_{n}, θ) ∣) .

This estimator is derived from the exponential mechanism of McSherry & Talwar (2007), where the “cost” function is taken to be |Ψ(F_n, ·)|/K. In many M -estimators of interest, particularly those involving data lying in a bounded range, a prior knowledge of K is reasonable.

If it is known that T_ψ (F) is contained in some interval, then one can take the prior density μ to be uniform over this interval. If no such prior knowledge is available, then μ can be taken to be a density with full support on ℝ such as the standard Cauchy density.

The privacy guarantee for Inline graphic follows easily from known properties of the exponential mechanism (McSherry & Talwar, 2007).

Proposition 3

Inline graphic is (α, 0)-differentially private.

The accuracy guarantee for Inline graphic relies on the following smoothness condition on Ψ at F.

Condition 4 (Smoothness)

There exist r₁ > 0, r₂ > 0, Λ₁ > 0, and Λ₂ > 0 such that

\begin{array}{l} ∣ Ψ^{'} (G, θ) - Ψ^{'} (F, θ) ∣ \leq Λ_{1} \cdot d_{GC} (G, F) and \\ ∣ Ψ^{'} (F, θ) - Ψ^{'} (F, T_{ψ} (F)) ∣ \leq Λ_{2} \cdot ∣ θ - T_{ψ} (F) ∣ \end{array}

whenever d_GC(G, F) ≤ r₁ and |θ − T_ψ(F)| ≤ r₂.

Also, for ε > 0 and η ∈ (0, 1), define N_ε_,_η:=min n ∈ ℕ: Pr_{F_n~F} [|T_ψ (F_n) − T_ψ (F)| > ε] ≤ η to be the minimum sample size such that, with probability ≥ 1 − η, the non-private estimator T_ψ(F_n) lies within distance ε of T_ψ(F).

Theorem 4

Assume Condition 3 and Condition 4 hold. Let ε₁:= min{r₁, |Ψ′(F, T_ψ(F))|/(6Λ₁)}, ε₂:= min{r₂/2, |Ψ′(F, T_ψ(F))|/(6Λ₂)}, and Γ:= K/|Ψ′(F, T_ψ(F))|. Pick any η ∈ (0, 1) and ε ∈ (0, ε₂). Suppose

n \geq max {\frac{ln (2 / η)}{2 ε_{1}^{2}}, N_{ε_{2}, η}},

(5)

and one of the following holds:

the range of T_ψ is contained in an interval I of length R, μ is the uniform density on I, and
$n \geq \frac{8 ln (6 R / ε η)}{α ε} \cdot Γ;$
$μ (θ) = \frac{1}{π} {(1 + θ^{2})}^{- 1}$ is the standard Cauchy density, and
$n \geq \frac{8}{α ε} \cdot ln (\frac{π}{η} (\frac{2 {(∣ T_{ψ} (F) ∣ + ε_{2})}^{2} + 1}{ε / 3} + \frac{ε}{6})) \cdot Γ .$

With probability at least 1 – 3η, the estimator Inline graphic satisfies

∣ A_{ψ, μ} (F_{n}) - T_{ψ} (F) ∣ \leq ∣ T_{ψ} (F_{n}) - T_{ψ} (F) ∣ + ε .

The proof of Theorem 4 is in Appendix D. The condition in (5) required by Theorem 4 essentially states that the sample size n should be large enough for F_n and T_ψ(F_n) to be in the neighborhoods of F and T_ψ(F), respectively, where Ψ′ is locally Lipschitz-smooth.

It is straightforward to generalize the results to other prior densities μ. Observe that in the case the range of T_ψ is [−R, R] for some unknown R, using the standard Cauchy density as μ yields a similar dependence on R (via log |T_ψ(F)| ≤ log R) as what is obtained when μ is uniform over [−R, R]. The more probability mass μ assigns around T_ψ(F), the better the bounds are.

Also note that the main scaling factor of Γ = K/|Ψ′(F, T_ψ(F))| in the sample size bound is precisely the bound on GES(T_ψ, F) from (4). A dependence on GES(T_ψ, F) is to be expected as per Theorem 1.

6. Conclusions

The finite sample analysis reveals a concrete connection between differential privacy and robust statistics, The main results shown here suggest using B-robustness as a criterion for designing differentially-private statistical estimators, and also highlight the obstacles that even robust estimators face when the parameter space is very large or unbounded.

While our lower bounds may seem pessimistic, they apply to estimators that succeed for a wide class of distributions. One way of avoiding our lower bounds would be by using priors that allow an estimator to perform well on some input distributions but not-so-well on others; a future research direction is to investigate how this can help design better differentially private estimators.

Acknowledgments

KC would like to thank NIH U54 HL108460 for research support.

A. Lemmas from Section 3

Lemma 2

Let Inline graphic be any (α, δ)-differentially private algorithm, and let D ∈ and D′ ∈ be two data sets which differ by ≤ k entries. Then, for any S,

{Pr}_{A} [A (D) \in S] \geq e^{- k α} \cdot {Pr}_{A} [A (D^{'}) \in S] - \frac{δ}{1 - e^{- α}} .

Proof

Let D = D₀, D₁, …, D_k = D′ be a sequence of data sets such that for any i, D_i differs from D_i₊₁ by a single entry. From Definition 1, for any S,

{Pr}_{A} [A (D_{i}) \in S] \geq e^{- α} {Pr}_{A} [A (D_{i + 1}) \in S] - δ .

(6)

Composing Equation (6) k times, we get:

{Pr}_{A} [A (D^{'}) \in S] \geq e^{- k α} \cdot {Pr}_{A} [A (D) \in S] - (δ + e^{- α} δ + \dots + e^{- (k - 1) α} δ)

The lemma follows from noting that $\sum_{j = 0}^{\infty} e^{- α j} = \frac{1}{1 - e^{- α}}$ .

Lemma 3

Let D ∈ Inline graphic and D′ ∈ be two datasets that differ in the value of at most Δ entries, and let be any (α, δ)-differentially private algorithm. For all $0 < γ < \frac{1}{3}$ , and for all τ and τ′, if $Δ \leq \frac{ln (1 / 2 γ)}{α}$ , and if $δ \leq \frac{1}{4} γ (1 - e^{- α})$ , then

E_{A} [∣ A (D) - τ ∣ + ∣ A (D^{'}) - τ^{'} ∣] \geq γ ∣ τ - τ^{'} ∣ .

Proof

Without loss of generality, assume that: τ < τ′ and let $t = \frac{1}{2} (τ^{'} - τ)$ . Let I = (τ − t, τ + t), and I′ = (τ′ − t, τ′ + t). Then I and I′ are disjoint. We first show that under the conditions of the lemma,

{Pr}_{A} [A (D) \in I] + {Pr}_{A} [A (D^{'}) \in I^{'}] \leq 2 (1 - γ)

(7)

Suppose this is not the case. Then,

\begin{array}{l} 2 γ > {Pr}_{A} [A (D) \notin I] + {Pr}_{A} [A (D^{'}) \notin I^{'}] \\ \geq {Pr}_{A} [A (D) \in I^{'}] + {Pr}_{A} [A (D^{'}) \in I] \\ \geq e^{- Δ α} ({Pr}_{A} [A (D^{'}) \in I^{'}] + {Pr}_{A} [A (D) \in I]) - \frac{2 δ}{1 - e^{- α}} \\ \geq e^{- Δ α} \cdot 2 (1 - γ) - \frac{γ}{2} . \end{array}

Here, the first step follows by assumption, the second step follows from the disjointedness of I and I′, the third step from Lemma 2, and the fourth step by assumption and the condition on δ. Now, as $Δ \leq \frac{ln (1 / 2 γ)}{α}$ , the quantity on the right hand side of the above equation is at least

2 γ \cdot 2 (1 - γ) - γ / 2 \geq \frac{7}{2} γ - 4 γ^{2} > 2 γ

for $γ \leq \frac{1}{3}$ . This is a contradiction, and thus Equation 7 holds. Using Equation 7, we can write:

\begin{array}{l} E_{A} [∣ A (D) - τ ∣ + ∣ A (D^{'}) - τ ∣] > E_{A} [∣ A (D) - τ ∣ ∣ A (D) \notin I] \cdot {Pr}_{A} [A (D) \notin I] + E_{A} [∣ A (D^{'}) - τ^{'} ∣ ∣ A (D^{'}) \notin I^{'}] \cdot {Pr}_{A} [A (D^{'}) \notin I^{'}] \\ \geq t \cdot ({Pr}_{A} [A (D) \notin I] + {Pr}_{A} [A (D^{'}) \notin I^{'}]) \\ \geq 2 t γ \end{array}

The lemma now follows from the observation that $t = \frac{1}{2} ∣ τ - τ^{'} ∣$ .

B. Linear Functionals

A functional T_a of the form T_a(F) = ∫ a(x)dF (x) is called a linear functional. The influence function (at all scales ρ) of T_a and F is

IF (x, T_{a}, F) = {IF}_{ρ} (x, T_{a}, F) = ∣ a (x) - T_{a} (F) ∣,

and therefore the gross error sensitivity is

GES (T_{a}, F) = {GES}_{ρ} (T_{a}, F) = sup_{x \in X} ∣ a (x) - T_{a} (F) ∣ .

Note that the range of T_a has diameter bounded by (twice) the gross error sensitivity.

The estimator Inline graphic from (3) with δ = 0 (so β(α, 0) = 0) has the following statistical guarantee.

Theorem 5

Pick any linear functional T_a and η ∈ (0, 1). Let σ² := ∫IF(x, T_a, F)²dF (x). With probability ≥ 1 − 2η, the estimator Inline graphic from (3) satisfies

\begin{array}{l} ∣ A_{T_{a}} (F_{n}) - T_{a} (F) ∣ \leq ∣ T_{a} (F_{n}) - T_{a} (F) ∣ + \frac{4 GES (T_{a}, F) ln (1 / η)}{α n} \\ \leq \sqrt{\frac{2 σ^{2} ln (2 / η)}{n}} + (\frac{2}{3} + \frac{4}{α}) \frac{GES (T_{α}, F) ln (2 / η)}{n} . \end{array}

Proof

Follows from Bernstein’s inequality, Proposition 2, Lemma 4 (below), a union bound, and the triangle inequality.

Example 4

If T (F) = ∫xdF(x) is the mean of F (and therefore a linear functional with a(x) = x), and the data domain is Inline graphic = [−R/2, R/2], then Γ_n = R. Therefore, the bound in Theorem 5 reduces to $O (\sqrt{\frac{σ^{2}}{n}} + \frac{R}{α n})$ where σ² is the variance of F.

Lemma 4

If T_a is a linear functional, then

{SS}_{0} (T_{a}, F_{n}) \leq \frac{2 GES (T_{a}, F)}{n} .

Proof

Observe that ${SS}_{0} (T_{a}, F_{n}) = sup ∣ T (G_{n}) - T (G_{n}^{'}) ∣ = {sup}_{x \in X} ∣ a (x) ∣ / n$ , where the first supremum is over empirical distributions G_n and $G_{n}^{'}$ for data sets differing in one entry. By the triangle inequality, this is at most 2 sup_x _∈|a(x) − T (F)|/n = 2GES(T_a, F)/n.

C. Proof of Lemma 1

Proof

Recall that the DKW inequality (Dvoretzky et al., 1956; Massart, 1990) implies Pr_{F_n~F} [d_GC(F_n, F) ≤ r_n] ≥ 1 − η for $r_{n} : = \sqrt{\frac{ln (2 / η)}{2 n}}$ . Since $2 r_{n} = \sqrt{2 ln (1 / η) / n}$ , the triangle inequality and Condition 2 imply that, with probability ≥ 1 − η,

{GES}_{1 / n} (T, G) \leq Γ_{n}

(8)

for all CDF G with d_GC(F_n, G) ≤ r_n. Henceforth assume the bound in (8) holds.

Now pick any D₁ ∈ ℝⁿ. It suffices to show that e^{−βd_H(D,D₁).} LS(T, D₁) ≤ max{2Γ_n/n, R exp(−β(n · r_n − 1))} for all such D₁.

Suppose for now that (d_H(D, D₁) + 1)/n ≤ r_n. Fix D₂ ∈ ℝⁿ such that d_H(D₁, D₂) = 1. Let j ∈ {1, 2, …, n} be the index at which D₁ and D₂ differ, and D₃ ∈ ℝⁿ⁻¹ be the database obtained from D₁ by removing the j-th entry of D₁. Finally, for i ∈ {1, 2, 3}, let G_i be the empirical CDF w.r.t. D_i. By the triangle inequality, d_GC(F_n, G₃) ≤ d_GC(F_n, G₁) + d_GC(G₁, G₃) ≤ (d_H(D, D₁)+1)/n ≤ r_n. Therefore the bound in (8) implies GES_1/_n(T, G₃) ≤ Γ_n. Let x₁ be the j-th entry of D₁, and x₂ be the j-th entry of D₂. Then, by the definitions of IF_1/_n and GES_1/_n,

\begin{array}{l} ∣ T (G_{1}) - T (G_{2}) ∣ = ∣ T (G_{1}) - T (G_{3}) + T (G_{3}) - T (G_{2}) ∣ \\ = \frac{∣ {IF}_{1 / n} (x_{1}, T, G_{3}) - {IF}_{1 / n} (x_{2}, T, G_{3}) ∣}{n} \\ \leq \frac{2 {GES}_{1 / n} (T, G_{3})}{n} \leq \frac{2 Γ_{n}}{n} . \end{array}

Because this holds for all choices of D₂, it follows that LS(T, D₁) ≤ 2Γ_n/n, and therefore e^{−βd_H(D, D₁)}. LS(T, D₁) ≤ 2Γ_n/n.

Now suppose instead that (d_H(D, D₁) + 1)/n > r_n. By Condition 1, LS(T, D₁) ≤ R. Therefore, we have e^{−βd_H(D, D₁).} LS(T, D₁) ≤ R · e^{−β(n·r_n−1)}.

D. Proof of Theorem 4

The proof of Theorem 4 is based on the following lemmas, which characterize the prior density μ and the exponential mechanism density p Inline graphic (F_n) around T_ψ(F) and T_ψ(F_n).

Lemma 5

Let μ be the uniform density on an interval I ⊂ ℝ of length R. If θ ∈ I, then μ([θ − ε, θ+ ε]) ≥ ε/R for any ε > 0.

Proof

If θ ∈ I, then the length of I ∩ [θ − ε, θ+ ε] is at least ε, and hence has mass at least ε/R under μ.

Lemma 6

Let μ be the standard Cauchy density $μ (θ) = \frac{1}{π} {(1 + θ^{2})}^{- 1}$ . For any θ ∈ ℝ, $μ ([θ - ε, θ + ε]) \geq \frac{1}{π} \cdot \frac{2 ε}{2 (θ^{2} + ε^{2}) + 1}$ for any ε > 0.

Proof

By Taylor’s theorem and the fact (a + b)² ≤ 2(a² + b²),

\begin{array}{l} μ ([θ - ε, θ + ε]) = \frac{1}{π} ({tan}^{- 1} (θ + ε) - {tan}^{- 1} (θ - ε)) \\ \geq inf_{ξ \in [θ - ε, θ + ε]} \frac{1}{π} \cdot \frac{2 ε}{ξ^{2} + 1} \\ \geq \frac{1}{π} \cdot \frac{2 ε}{2 (θ^{2} + ε^{2}) + 1} . \end{array}

Lemma 7

Assume Condition 3 and Condition 4 hold. For 0 < ε ≤ min{r₂/2,|Ψ′ (F, θ_*)|/(6Λ₂)},

{Pr}_{A_{ψ, μ}} [∣ A_{ψ, μ} (F_{n}) - θ_{n} ∣ > ε ∣ E_{good}] \leq \frac{1}{c_{μ, ε}} exp (- \frac{n α ∣ Ψ^{'} (F, θ_{*}) ∣ ε}{8 K})

where θ_* = T_ψ (F), θ_n = T_ψ(F_n), c_μ_,_ε = μ([θ_n − ε/6, θ_n + ε/6]), and E_good is the event in which

\begin{matrix} d_{GC} (F_{n}, F) \leq min {r_{1}, ∣ Ψ^{'} (F, θ_{*}) ∣ / (6 Λ_{1})} and \\ ∣ θ_{n} - θ_{*} ∣ \leq min {r_{2} / 2, ∣ Ψ^{'} (F, θ_{*}) ∣ / (6 Λ_{2})} . \end{matrix}

Proof

Define

s_{bad} : = min {∣ Ψ (F_{n}, θ_{n} - ε) ∣, ∣ Ψ (F_{n}, θ_{n} + ε) ∣} .

By the monotonicity of Ψ due to Condition 3, we have |Ψ (F_n, θ)| ≥ s_bad for all θ ∉ [θ_n − ε, θ_n + ε]. Also, define

s_{good} : = sup {∣ Ψ (F_{n}, θ) ∣ : θ \in [θ_{n} - ε / 6, θ_{n} + ε / 6]} .

Then,

\begin{array}{l} {Pr}_{A_{ψ, μ}} [∣ A_{ψ, μ} (F_{n}) - θ_{n} ∣ > ε ∣ E_{good}] = \frac{\int_{θ \notin [θ_{n} - ε, θ_{n} + ε]} μ (θ) \cdot exp (- \frac{n α}{2 K} ∣ Ψ (F_{n}, θ) ∣) d θ}{\int_{- \infty}^{\infty} μ (θ) \cdot exp (- \frac{n α}{2 K} ∣ Ψ (F_{n}, θ) ∣) d θ} \\ \leq \frac{\int_{θ \notin [θ_{n} - ε, θ_{n} + ε]} μ (θ) \cdot exp (- \frac{n α}{2 K} s_{bad}) d θ}{\int_{θ \in [θ_{n} - ε / 6, θ_{n} + ε / 6]} μ (θ) \cdot exp (- \frac{n α}{2 K} s_{good}) d θ} \\ \leq \frac{1}{c_{μ, ε}} \cdot exp (- \frac{n α}{2 K} (s_{bad} - s_{good})) . \end{array}

Therefore, it remains to show that s_bad − s_good ≥ 0.25|Ψ′ (F, θ_*)|ε assuming the event E_good holds.

Pick any θ ∈ [θ_n − ε, θ_n + ε]. By Taylor’s theorem and the fact Ψ(F_n, θ_n) = 0, there exists some θ̃ ∈ [θ_n − ε, θ_n + ε] such that

\begin{array}{l} Ψ (F_{n}, θ) = Ψ^{'} (F_{n}, \tilde{θ}) \cdot (θ - θ_{n}) \\ = Ψ^{'} (F, θ_{*}) \cdot (θ - θ_{n}) + (Ψ^{'} (F, \tilde{θ}) - Ψ^{'} (F, θ_{*})) \cdot (θ - θ_{n}) + (Ψ^{'} (F_{n}, \tilde{θ}) - Ψ^{'} (F, \tilde{θ})) \cdot (θ - θ_{n}) . \end{array}

(9)

Since ε ≤ min{r₂/2, |Ψ′(F, θ_*)|/(6Λ₂)}, the triangle inequality and the event E_good imply

∣ \tilde{θ} - θ_{*} ∣ \leq ∣ \tilde{θ} - θ_{n} ∣ + ∣ θ_{n} - θ_{*} ∣ \leq min {r_{2}, ∣ Ψ^{'} (F, θ_{*}) ∣ / (3 Λ_{2})}

and therefore

\begin{array}{l} ∣ Ψ^{'} (F, \tilde{θ}) - Ψ^{'} (F, θ_{*}) ∣ \leq Λ_{2} \cdot ∣ \tilde{θ} - θ_{*} ∣ \\ \leq ∣ Ψ^{'} (F, θ_{*}) ∣ / 3 \end{array}

(10)

by Condition 4. Because the event E_good also implies d_GC(F_n, F) ≤ min{r₁, |Ψ′(F, θ_*)|/(6Λ₁)}, we have

\begin{array}{l} ∣ Ψ^{'} (F_{n}, \tilde{θ}) - Ψ^{'} (F, \tilde{θ}) ∣ \leq Λ_{1} \cdot d_{GC} (F_{n}, F) \\ \leq ∣ Ψ^{'} (F, θ_{*}) ∣ / 6 \end{array}

(11)

also by Condition 4. Therefore, using the triangle inequality and those from (10) and (11) in the equation (9) gives the bound

\begin{array}{l} ∣ Ψ (F_{n}, θ) ∣ \geq ∣ Ψ^{'} (F, θ_{*}) ∣ ∣ θ - θ_{n} ∣ - ∣ Ψ^{'} (F, \tilde{θ}) - Ψ^{'} (F, θ_{*}) ∣ ∣ θ - θ_{n} ∣ - ∣ Ψ^{'} (F_{n}, \tilde{θ}) - Ψ^{'} (F, \tilde{θ}) ∣ ∣ θ - θ_{n} ∣ \\ \geq ∣ Ψ^{'} (F, θ_{*}) ∣ ∣ θ - θ_{n} ∣ - ∣ Ψ^{'} (F, θ_{*}) ∣ ∣ θ - θ_{n} ∣ / 2 \\ = 0.5 ∣ Ψ^{'} (F, θ_{*}) ∣ ∣ θ - θ_{n} ∣ \end{array}

(12)

and, similarly,

∣ Ψ (F_{n}, θ) ∣ \leq 1.5 ∣ Ψ^{'} (F, θ_{*}) ∣ ∣ θ - θ_{n} ∣ .

(13)

Note that (12) implies the lower-bound

s_{bad} \geq 0.5 ∣ Ψ^{'} (F, θ_{*}) ∣ ε .

It remains to derive an upper-bound on s_good. Define θ₀ := inf{θ ∈ ℝ: Ψ(F_n, θ) ≥ − |Ψ′(F, θ_*)|ε/4} and θ₁ := sup{θ ∈ ℝ: Ψ(F_n, θ) ≤ |Ψ′(F, θ_*)|ε/4}. By monotonicity of Ψ from Condition 3, we have that if,

∣ Ψ (F_{n}, θ) ∣ \leq 0.25 ∣ Ψ^{'} (F_{n}, θ_{n}) ∣ ε, .

then θ ∈ [θ₀, θ₁], and vice versa. Now take any θ ∈ [θ_n − ε/6, θ_n + ε/6]. Note that by (12),

Ψ (F_{n}, θ) \geq - 0.5 ∣ Ψ^{'} (F, θ_{*}) ∣ ε / 6 > - ∣ Ψ^{'} (F, θ_{*}) ∣ ε / 4

so θ ≥ θ₀, and by (13),

Ψ (F_{n}, θ) \leq 1.5 ∣ Ψ^{'} (F, θ_{*}) ∣ ε / 6 = ∣ Ψ^{'} (F, θ_{*}) ∣ ε / 4

so θ ≤ θ₁. Therefore [θ_n − ε/6, θ_n + ε/6] ⊆ [θ₀, θ₁], and hence s_good ≤ 0.25|Ψ′(F, θ_*)| ε. The claim is proved by combining the bounds on s_bad and s_good.

We now prove Theorem 4.

Proof of Theorem 4

Let E_good be the event in which

d_{GC} (F_{n}, F) \leq ε_{1} and ∣ T_{ψ} (F_{n}) - T_{ψ} (F) ∣ \leq ε_{2} .

By the DKW inequality, the definition of N_ε₂,η, the bound on the sample size n, and a union bound, we have

{Pr}_{F_{n} ~ F} [E_{good}] \geq 1 - 2 η .

By Lemma 7, conditioned on the event E_good, we have

{Pr}_{A_{ψ, μ}} [∣ A_{ψ, μ} (F_{n}) - T_{ψ} (F_{n}) ∣ \leq ε ∣ E_{good} ∣] \geq 1 - η

where we have used either Lemma 5 or Lemma 6 (with the fact |T_ψ(F_n) −T_ψ(F)| ≤ ε₂ in the event E_good) and the bound on the sample size n. A union bound and the triangle inequality completes the proof.

E. Alternative to Condition 2

Consider the following alternative to Condition 2.

Condition 5 (Bounded gross error sensitivity with exponent p)

The sequence (Γ_p_,_n) given by

Γ_{p, n} : = sup {{GES}_{1 / n} (T, G) : G \in B_{GC} (F, \sqrt{\frac{ln (2 / η)}{2 n}} + n^{- p})}

is bounded for some p ∈ [0, 1/2].

Condition 2 (roughly) corresponds to exponent p = 1/2, which is the weakest condition among all p ∈ [0, 1/2].

By essentially the same proof as that of Lemma 1, it follows that under Condition 1 and Condition 5, we have with probability ≥ 1 − η,

{SS}_{β} (T, F_{n}) \leq max {\frac{2 Γ_{p, n}}{n}, R exp (- β (n^{1 - p} - 1))} .

Using this in place of Lemma 1, the bound in Theorem 3 becomes

∣ A_{T} (F_{n}) - T (F) ∣ \leq ∣ T (F_{n}) - T (F) ∣ + \frac{2 ln (1 / η)}{α} max {\frac{2 Γ_{p, n}}{n}, R \cdot exp (- \frac{α (n^{1 - p} - 1)}{2 ln (1 / δ)})} .

Footnotes

See Appendix A for omitted lemmas.

Appendix E shows how this discrepancy can be reduced with a stronger condition.

Appearing in Proceedings of the 29 ^th International Conference on Machine Learning, Edinburgh, Scotland, UK, 2012.

Contributor Information

Kamalika Chaudhuri, Email: kamalika@cs.ucsd.edu, University of California, San Diego, La Jolla, CA 92093.

Daniel Hsu, Email: dahsu@microsoft.com, Microsoft Research, New England, Cambridge, MA 02142.

References

Barak B, Chaudhuri K, Dwork C, Kale S, Mc-Sherry F, Talwar K. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. PODS. 2007 [Google Scholar]
Blum A, Dwork C, McSherry F, Nissim K. Practical privacy: the SuLQ framework. PODS. 2005 [Google Scholar]
Chaudhuri K, Hsu D. Sample complexity bounds for differentially private learning. COLT. 2011 [PMC free article] [PubMed] [Google Scholar]
Chaudhuri K, Monteleoni C, Sarwate A. Differentially private empirical risk minimization. Journal of Machine Learning Research. 2011 [PMC free article] [PubMed] [Google Scholar]
Dvoretzky A, Kiefer J, Wolfowitz J. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Annals of Mathematical Statistics. 1956;27(3):642–669. [Google Scholar]
Dwork C, Lei J. Differential privacy and robust statistics. STOC. 2009 [Google Scholar]
Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M. Our data, ourselves: Privacy via distributed noise generation. EUROCRYPT. 2006a [Google Scholar]
Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. TCC. 2006b [Google Scholar]
Friedman A, Schuster A. Data mining with differential privacy. KDD. 2010 [Google Scholar]
Ganta SR, Kasiviswanathan SP, Smith A. Composition attacks and auxiliary information in data privacy. KDD. 2008 [Google Scholar]
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust Statistics - The Approach Based on Influence Functions. Wiley; 1986. [Google Scholar]
Hardt M, Talwar K. On the geometry of differential privacy. STOC. 2010 [Google Scholar]
Huber PJ. Robust Statistics. Wiley; 1981. [Google Scholar]
Lei J. Differentially private M-estimators. NIPS. 2011 [Google Scholar]
Massart P. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Annals of Probability. 1990;18(3):1269–1283. [Google Scholar]
McSherry F, Mironov I. Differentially private recommender systems: building privacy into the net. KDD. 2009 [Google Scholar]
McSherry F, Talwar K. Mechanism design via differential privacy. FOCS. 2007 [Google Scholar]
Mohammed N, Chen R, Fung BCM, Yu PS. Differentially private data release for data mining. KDD. 2011 [Google Scholar]
Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity and sampling in private data analysis. STOC. 2007 [Google Scholar]
Rubinstein Benjamin IP, Bartlett Peter L, Huang Ling, Taft Nina. Learning in a large function space: Privacy-preserving mechanisms for svm learning. CoRR. 2009 abs/0911.5708. [Google Scholar]
Smith A. Privacy-preserving statistical estimation with optimal convergence rates. STOC. 2011 [Google Scholar]
Vu D, Slavkovic A. Differential privacy for clinical trial data: Preliminary evaluations. Data Mining Workshops, ICDMW; 2009. [Google Scholar]
Wasserman L. All of non-parametric statistics. Springer; 2006. [Google Scholar]

[R1] Barak B, Chaudhuri K, Dwork C, Kale S, Mc-Sherry F, Talwar K. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. PODS. 2007 [Google Scholar]

[R2] Blum A, Dwork C, McSherry F, Nissim K. Practical privacy: the SuLQ framework. PODS. 2005 [Google Scholar]

[R3] Chaudhuri K, Hsu D. Sample complexity bounds for differentially private learning. COLT. 2011 [PMC free article] [PubMed] [Google Scholar]

[R4] Chaudhuri K, Monteleoni C, Sarwate A. Differentially private empirical risk minimization. Journal of Machine Learning Research. 2011 [PMC free article] [PubMed] [Google Scholar]

[R5] Dvoretzky A, Kiefer J, Wolfowitz J. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Annals of Mathematical Statistics. 1956;27(3):642–669. [Google Scholar]

[R6] Dwork C, Lei J. Differential privacy and robust statistics. STOC. 2009 [Google Scholar]

[R7] Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M. Our data, ourselves: Privacy via distributed noise generation. EUROCRYPT. 2006a [Google Scholar]

[R8] Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. TCC. 2006b [Google Scholar]

[R9] Friedman A, Schuster A. Data mining with differential privacy. KDD. 2010 [Google Scholar]

[R10] Ganta SR, Kasiviswanathan SP, Smith A. Composition attacks and auxiliary information in data privacy. KDD. 2008 [Google Scholar]

[R11] Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust Statistics - The Approach Based on Influence Functions. Wiley; 1986. [Google Scholar]

[R12] Hardt M, Talwar K. On the geometry of differential privacy. STOC. 2010 [Google Scholar]

[R13] Huber PJ. Robust Statistics. Wiley; 1981. [Google Scholar]

[R14] Lei J. Differentially private M-estimators. NIPS. 2011 [Google Scholar]

[R15] Massart P. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Annals of Probability. 1990;18(3):1269–1283. [Google Scholar]

[R16] McSherry F, Mironov I. Differentially private recommender systems: building privacy into the net. KDD. 2009 [Google Scholar]

[R17] McSherry F, Talwar K. Mechanism design via differential privacy. FOCS. 2007 [Google Scholar]

[R18] Mohammed N, Chen R, Fung BCM, Yu PS. Differentially private data release for data mining. KDD. 2011 [Google Scholar]

[R19] Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity and sampling in private data analysis. STOC. 2007 [Google Scholar]

[R20] Rubinstein Benjamin IP, Bartlett Peter L, Huang Ling, Taft Nina. Learning in a large function space: Privacy-preserving mechanisms for svm learning. CoRR. 2009 abs/0911.5708. [Google Scholar]

[R21] Smith A. Privacy-preserving statistical estimation with optimal convergence rates. STOC. 2011 [Google Scholar]

[R22] Vu D, Slavkovic A. Differential privacy for clinical trial data: Preliminary evaluations. Data Mining Workshops, ICDMW; 2009. [Google Scholar]

[R23] Wasserman L. All of non-parametric statistics. Springer; 2006. [Google Scholar]

PERMALINK

Convergence Rates for Differentially Private Statistical Estimation

Kamalika Chaudhuri

Daniel Hsu

Abstract

1. Introduction

Related Work

2. Preliminaries

Definition 1

Definition 2

Definition 3

3. Lower Bounds

3.1. Lower Bounds based on Gross Error Sensitivity

Theorem 1

Example 1

Proof of Theorem 1

3.2. Lower Bounds as a Function of Range

Theorem 2

Example 2

Proof of Theorem 2

4. Upper Bounds

Definition 4

4.1. Estimator Based on Smooth Sensitivity

Proposition 1

Proposition 2

Condition 1 (Bounded range)

Condition 2 (Bounded gross error sensitivity)

Theorem 3

Proof

Example 3

4.2. Bounding the Smooth Sensitivity

Lemma 1

5. Differentially-Private M-Estimation

5.1. M-Estimators

Condition 3 (Bounded ψ-range and monotonicity)

5.2. Exponential Mechanism for M-Estimation

Proposition 3

Condition 4 (Smoothness)

Theorem 4

6. Conclusions

Acknowledgments

A. Lemmas from Section 3

Lemma 2

Proof

Lemma 3

Proof

B. Linear Functionals

Theorem 5

Proof

Example 4

Lemma 4

Proof

C. Proof of Lemma 1

Proof

D. Proof of Theorem 4

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Proof of Theorem 4

E. Alternative to Condition 2

Condition 5 (Bounded gross error sensitivity with exponent p)

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases