Abstract
The existing work has conducted in-depth research and analysis on global differential privacy (GDP) and local differential privacy (LDP) based on information theory. However, the data privacy preserving community does not systematically review and analyze GDP and LDP based on the information-theoretic channel model. To this end, we systematically reviewed GDP and LDP from the perspective of the information-theoretic channel in this survey. First, we presented the privacy threat model under information-theoretic channel. Second, we described and compared the information-theoretic channel models of GDP and LDP. Third, we summarized and analyzed definitions, privacy-utility metrics, properties, and mechanisms of GDP and LDP under their channel models. Finally, we discussed the open problems of GDP and LDP based on different types of information-theoretic channel models according to the above systematic review. Our main contribution provides a systematic survey of channel models, definitions, privacy-utility metrics, properties, and mechanisms for GDP and LDP from the perspective of information-theoretic channel and surveys the differential privacy synthetic data generation application using generative adversarial network and federated learning, respectively. Our work is helpful for systematically understanding the privacy threat model, definitions, privacy-utility metrics, properties, and mechanisms of GDP and LDP from the perspective of information-theoretic channel and promotes in-depth research and analysis of GDP and LDP based on different types of information-theoretic channel models.
Keywords: GDP vs. LDP, information-theoretic channel, Rényi divergence, mutual information, expected distortion
1. Introduction
It is assumed that an attacker has background knowledge of name information about n patients in a medical dataset with a certain disease. The attacker can statistically query the sum of disease status of patients except the i-th patient and the sum of disease status with all n patients and then can infer whether the i-th patient has a disease by comparing the two statistical query results. To mitigate the problem of individual privacy leakage caused by the above statistical inference attack, Dwork et al. [1] proposed differential privacy (DP) to protect individual privacy independent of the presence or absence of any individual. Since DP requires that the data collector is trustworthy in a centralized setting, it is called centralized DP. Moreover, because DP considers global sensitivity of adjacent datasets, it is also known as global differential privacy (GDP). However, the data collector is untrusted in real-world applications. Therefore, Kasiviswanathan et al. [2] proposed that local differential privacy (LDP) allows an untrusted third party to perform statistical analysis while achieving user’s privacy by random perturbation of local data. Both GDP and LDP have privacy-utility monotonicity and can achieve privacy-utility tradeoff [3]. GDP and LDP have become popular methods of data privacy preserving of the centralized and local setting, respectively. However, GDP and LDP have different advantages and disadvantages. In Table 1, we agree with Dobrota’s [4] comparative analysis results of the advantages and disadvantages of GDP and LDP.
Table 1.
Advantages and disadvantages of GDP and LDP.
Privacy Type | Advantage | Disadvantage |
---|---|---|
GDP | Better data utility | Needing trusted data collector |
Suitable for dataset of any scale | ||
LDP | Without needing trusted data collector | Poor data utility |
Not applicable to small scale dataset |
Because of the advantages of using GDP and LDP in the centralized and local setting, respectively, the data privacy community has widely studied GDP and LDP based on information theory. The current work focuses on GDP and LDP from the following aspects based on information theory, including privacy threat model, channel models and definitions of GDP and LDP, privacy-utility metrics of GDP and LDP, properties of GDP and LDP, and mechanisms satisfying GDP and LDP. Unless otherwise stated, the information-theoretic channel model refers to the discrete single symbol information-theoretic channel in this survey. However, there is no review work to systematically survey the above existing work on GDP and LDP from the perspective of information-theoretic channel.
Therefore, this paper systematically surveyed GDP and LDP under the information-theoretic channel model from the aspects of resisting privacy threat model, channel models, definitions, privacy-utility metrics, properties, and achieving mechanisms. Our main contributions are as follows.
(1) We summarized the privacy threat model under information-theoretic channel, and we provided a systematic survey on channel models, definitions, privacy-utility metrics, properties, and mechanisms of GDP and LDP from the perspective of information-theoretic channel.
(2) We presented a comparative analysis between GDP and LDP from the perspective of information-theoretic channel. Then, we concluded the common channel models, definitions, privacy-utility metrics, properties, and achieving mechanisms of GDP and LDP in the existing work.
(3) We surveyed applications of GDP and LDP in synthetic data generation. Specifically, we first presented the membership inference attack and model extraction attack against generative adversarial network (GAN). Then, we reviewed the differential privacy synthetic data generation with GAN and differential privacy synthetic data generation with federated learning, respectively.
(4) Through analyzing the advantages and disadvantages of the existing work for different application scenarios and data types, we also discussed the open problems of GDP and LDP based on different types of information-theoretic channel models in the future.
This paper is organized as follows. Section 2 introduces the preliminaries. Section 3 summarizes the privacy threat model of centralized and local data setting under information-theoretic channel. Section 4 describes the channel models of GDP and LDP and uniformly states and analyzes the definitions of GDP and LDP under their channel models. Section 5 summarizes and compares the information-theoretic privacy-utility metrics of GDP and LDP. In Section 6, we present and analyze the properties of GDP and LDP from the perspective of information-theoretic channel. Section 7 summarizes and analyzes the mechanisms of GDP and LDP from the perspective of information-theoretic channel. Section 8 discusses the open problems of GDP and LDP from the perspective of different types of information-theoretic channel on different application scenarios and data types. Section 9 concludes this paper.
2. Preliminaries
In this section, we introduce the preliminaries of GDP [1], LDP [5], and the information-theoretic channel model and metrics [5,6,7,8,9,10,11]. The commonly used mathematical symbols are summarized in Table 2.
Table 2.
Common mathematical symbols.
Symbol | Description |
---|---|
x | Dataset |
Randomized mechanism | |
Privacy budget | |
Probability without satisfying differential privacy | |
X | Input random variable of information-theoretic channel |
Y | Output random variable of information-theoretic channel |
Channel transition probability matrix | |
Probability distribution on source X | |
Another probability distribution on source X | |
Rényi divergence | |
Rényi entropy | |
Shannon entropy | |
Min-entropy | |
Conditional Rényi entropy | |
Conditional Shannon entropy | |
Conditional min-entropy | |
Mutual information | |
Max-information | |
-approximate max-information | |
Kullback–Leibler divergence | |
f-divergence | |
Total variation distance | |
Max-divergence | |
-approximate max-divergence | |
Expected distortion | |
Single symbol distortion | |
Error probability | |
A class of functions | |
A divergence | |
-restricted -divergence | |
-restricted f-divergence | |
-restricted Rényi divergence |
2.1. GDP and LDP
A dataset x is collections of records coming from a universal set X, and each denotes the i-th item or a subset in the dataset x. When two datasets are different in only one item, the two datasets are adjacent datasets.
Definition 1 (GDP).
A randomized mechanism with domain X is -DP if for all and for any two adjacent datasets , it holds
(1) where the probability space is over the coin flips of the mechanism . If , then is ε-DP.
The coin flips of the mechanism mean that a DP mechanism inherently has only equally likely outcomes with regard to each record of each individual. The equally likely to occur means that the probability distribution of response to any query is the same independent of any individual opting presence or absence in the dataset. If is -DP, then is -DP with probability at least for all datasets x and when x and are adjacent datasets. For the definition of LDP, the coin flips of mechanism have the same meanings.
Definition 2 (LDP).
A randomized mechanism satisfies ε-LDP if and only if for any pairs input values x and in the domain of X, and for any possible output , it holds
(2) where the probability space is over the coin flips of the mechanism . If , then is ε-LDP.
2.2. Information-Theoretic Channel and Metrics
The mathematical model of an information-theoretic channel can be denoted by , where
(1) X is an input random variable, and its value set is .
(2) Y is an output random variable, and its value set is .
(3) is the channel transition probability matrix, and the sum of the probabilities in each row satisfies .
In information-theoretic channel model, the Rényi divergence of a probability distribution on source X from another distribution is , where and . When is the uniform distribution with , the Rényi entropy is in terms of the Rényi divergence of . When , the Rényi entropy tends to the Shannon entropy of source X. When , the Rényi entropy tends to the min-entropy . The conditional Rényi entropy of X given Y is . When , the conditional Rényi entropy is conditional Shannon entropy . When , the conditional Rényi entropy is conditional min-entropy . The mutual information is the average information measure of X contained in random variable Y. Furthermore, the max-information is , and the -approximate max-information is .
Moreover, when , the Rényi divergence is Kullback–Leibler (KL) divergence . The KL-divergence is an instance of the family of f-divergence with non-negative convex functions . The total variation distance is also an instance of the family of f-divergence with , and the total variation distance between distributions and is . When , the Rényi divergence is is max-divergence , and the -approximate max-divergence is .
The expected distortion between input random variable X and output random variable Y is
(3) |
where the distance measurement is single symbol distortion. The average error probability is
(4) |
Thus, the average error probability is expected Hamming distortion, when is Hamming distortion in Equation (3).
3. Privacy Threat Model on Information-Theoretic Channel
To mitigate the statistical inference attack, the GDP has a strong adversary assumption in which an adversary knows dataset records and tries to identify the remaining one [12,13]. However, the adversary is usually computationally bounded. Thus, Mironov [11] and Mir [14] assumed that the adversary has prior knowledge over the set of possible input dataset X. Furthermore, Smith [15] proposed one-try attack, where an adversary is allowed to ask exactly one question about form, “is ?”. The Rényi min-entropy of X denotes the probability of success for one-try attack with the best strategy, which chooses the with maximum probability. The conditional Rényi min-entropy of X given Y captures the probability of guessing the value of X in one single try when the output of Y is known. Therefore, the privacy leakage of channel model is Rényi min-entropy leakage under one-try attack [7]. The Rényi min-entropy leakage is max-information, and it is the ratio of the probabilities of attack success with a priori probability and a posterior probability. Thus, a Rényi min-entropy leakage corresponds to the concept of Bayes risk, which can also be regarded as a measure of the effectiveness of the attack. The maximal leakage is the maximal reduction in uncertainty about X when Y is observed [16]. The maximal leakage is taken by maximizing over all input distributions.
When adversary possesses knowledge of a priori probability distribution of input, LDP can lead to the risk of privacy leakage [2,17,18,19,20,21,22]. However, a better privacy-utility tradeoff can be achieved by incorporating the attacker’s knowledge into the LDP. Therefore, data utility can be improved by explicitly modeling the adversary’s prior knowledge of the LDP.
To sum up, the privacy threat of information-theoretic channel refers to the Bayes risk on input X, when attack known output Y. Thus, GDP and LDP can be used to mitigate the above privacy threat on information-theoretic channel for numerical data and categorical data, respectively.
4. Information-Theoretic Channel Models and Definitions of GDP and LDP
In this section, we summarize and compare information-theoretic channel models of GDP and LDP. Furthermore, we present the information-theoretic definitions of GDP and LDP under their information-theoretic channel models and compare the definitions of GDP (LDP) with other information-theoretic privacy definitions.
4.1. Information-Theoretic Channel Models of GDP and LDP
In Table 3, Alvim et al. [7] had constructed an information-theoretic channel model of GDP to any query function of adjacent datasets, where is DP mapping on input dataset X and random output Z of real output Y. Similarly, we can also construct an information-theoretic channel model of LDP to any different single input x and , where is LDP mapping on categorical dataset of single input and categorical dataset of single random output. Next, we will survey and compare the information-theoretic definitions of GDP and LDP under the above given information-theoretic channel models.
Table 3.
Information-theoretic channel models of GDP and LDP.
Privacy Type | Data Type | Input | GDP and LDP Mapping | Real Output | Random Output | Adjacent Relationship | |
---|---|---|---|---|---|---|---|
GDP [7] | Numerical data | Dataset | X | Y | Z | x and are adjacent datasets. | |
LDP | Categorical data | Data item | X | X | Z | x and are different. |
4.2. Information-Theoretic Definitions of GDP and LDP
In Table 4, we summarize the current work on definitions of GDP using different information-theoretic metrics under the information-theoretic channel model. Alvim et al. [7] intuitively gave the definition of -DP using transition probability distribution for all , with adjacent datasets x and . Barthe and Olmedo [8] defined -DP based on f-divergence, which is a redefinition of DP. Dwork and Roth [9] gave the definitions of -DP and -DP based on max-divergence, which is an equivalent definition of DP from the perspective of information-theoretic channel. Mironov [11] defined the -Rényi DP (RDP) using Rényi divergence, and -RDP satisfies -DP. When , -RDP is -DP according to the max-divergence. Conversely, -DP is -RDP [23]. We can conclude that RDP is a generalization of GDP. When , -RDP is the definitions of -DP based on the KL-divergence of Reference [8]. When , -RDP is the definitions of -DP and -DP based on the maximum divergence of Reference [9]. According to the f-divergence, Asoodeh et al. [24] also established the optimal relationship between RDP and -DP to help to derive the optimal -DP parameters of a mechanism for a given level of RDP. Chaudhuri et al. [25] defined -capacity bounded DP based on -restricted divergence, where is a class of functions and is a divergence. The -capacity bounded DP relaxes GDP by restricting the adversary to attack or post-process the output of a privacy mechanism using functions drawn from a restricted function class and models adversaries of this form with restricted f-divergences between probability distributions on datasets different from a single record. The -restricted f-divergence is , where is Fenchel conjugate and . The -restricted Rényi divergence is , where is the -restricted Rényi divergence of order . When is the class of all functions and is the Rényi divergence, this definition reduced to RDP. Additionally, when is the f-divergence, this definition is -DP of Reference [8]. Thus, capacity bounded DP is a generalization of RDP.
Table 4.
GDP definitions using different information-theoretic metrics.
Existing Work | Privacy Type | Information-Theoretic Metric | Formula | Description |
---|---|---|---|---|
DP [7] | -DP | Channel transition probability | The transition probability matrix is used as the GDP mapping. | |
DP [8] | -DP | f-divergence |
|
f-divergence includes KL-divergence. |
DP [9] | -DP | Max-divergence |
|
Since the max-divergence is not symmetric and does not satisfy triangular inequality, the reciprocal of equation must be true. |
-DP |
|
|||
-RDP [11] | -DP | Rényi divergence | When , -RDP is -DP according to max-divergence. If is -DP, then is -RDP [23]. | |
-DP | If is -RDP, then it also satisfies -DP. | |||
Capacity bounded DP [25] | -DP | -restricted divergence | An adversary cannot distinguish between and beyond in the function class , where is the f-divergence. | |
-RDP | An adversary cannot distinguish between and beyond in the function class , where is the Rényi divergence. |
We compare the other information-theoretic privacy definitions and GDP under the information-theoretic channel model in Table 5. Calmon and Fawaz [26] provided -information privacy, which is stronger than -DP. Makhdoumi and Fawaz [27] also showed that -information privacy is much stronger than -DP, -strong DP is stronger than -information privacy, and -DP is stronger than -DP. Wang et al. [12] analyzed the relation between identifiability, DP, and mutual-information privacy and demonstrated that -identifiability is stronger than -DP and -DP is stronger than -mutual-information privacy. Cuff and Yu [13] also proved that -DP is stronger than -mutual-information DP and -mutual-information DP is stronger than -DP, where -mutual-information DP is -mutual-information privacy of Reference [12].
Table 5.
Comparative analysis of GDP and other information-theoretic privacy definitions.
Existing Work | Information-Theoretic Privacy Definition | Formula | Description | Relationship to GDP | Stronger or Weaker than GDP |
---|---|---|---|---|---|
[26] | -information privacy | When the output is given, the posterior and prior probabilities of the input x do not change significantly. | -information privacy ⇒-DP | -information privacy is stronger than -DP. | |
[27] | -strong DP | -strong DP relaxes the adjacent datasets assumption. | -strong DP ⇒-information privacy -information privacy ⇒-strong DP -information privacy ⇒-worst-case divergence privacy -worst-case divergence privacy ⇒-divergence privacy -DP ⇒-DP | -strong DP is stronger than -information privacy. -information privacy is stronger than -DP. -DP is stronger than -DP. | |
-information privacy | The same as above. | The same as above. | |||
Worst-case divergence privacy | Some private data S are correlated with some non-private data X. | ||||
[12] | -identifiability | Two adjacent datasets cannot be distinguished from the posterior probabilities after observing the output dataset, which makes any individual’s data hard to identify. | -identifiability ⇒-DP -DP ⇒-mutual-information privacy | -identifiability is stronger than -DP. -DP is stronger than -mutual-information privacy. | |
-mutual-information privacy | Mutual-information privacy measures the average amount of information about X contained in Z. | ||||
[13] | -mutual-information DP | The same as -mutual-information privacy in work [12] above, and represents the input dataset except the i-th element. | -DP⇒-mutual information DP ⇒-DP | -DP is stronger than -mutual-information DP. -mutual-information DP is stronger than -DP. |
In the information-theoretic channel model of LDP of Table 3, we use the convex polytope proposed by Holohan et al. [28] as the general definition of the LDP. Thus, the definition of LDP for any different single input x and and Hamming distance is
(5) |
where and .
In Table 6, we make the comparative analysis of other information-theoretic privacy definitions and LDP under information-theoretic channel model. Jiang et al. [19] compared LDP, mutual-information privacy [12], and local information privacy, where local information privacy is information privacy of Reference [26]. When privacy budget is , -local information privacy is stronger than -mutual-information privacy and -LDP, and -LDP is stronger than -local information privacy. Lopuhaä-Zwakenberg et al. [21] also showed the same conclusion above and also proved that -side-channel resistant local information privacy (SRLIP) is stronger than -local information privacy when the privacy budget is .
Table 6.
Comparative analysis of LDP and other information-theoretic privacy definitions.
Existing Work | Information-Theoretic Privacy Definition | Formula | Description | Relationship to LDP | Stronger or Weaker than LDP |
---|---|---|---|---|---|
[12,19] | -mutual-information privacy | The same as Table 5 above. | The same as Table 5 above. | -local information privacy ⇒-mutual-information privacy -local information privacy ⇒-LDP -LDP ⇒-local information privacy | -local information privacy is stronger than 2-LDP. -LDP is stronger than -local information privacy. |
[19,26] | -local information privacy | ||||
[21,26] | -local information privacy | The same as Table 5 above. | The same as Table 5 above. | -LDP ⇒-local information privacy -local information privacy ⇒-LDP -SRLIP ⇒-local information privacy | -LDP is stronger than -local information privacy. -local information privacy is stronger than 2-LDP. |
[21] | -SRLIP | SRLIP satisfies -LIP for the attacker accessing some data of a user and does not leak sensitive data s beyond the knowledge the attacker gained from the side channel. |
5. Privacy-Utility Metrics of GDP and LDP under Information-Theoretic Channel Models
In Table 7, we summarize and analyze the information-theoretic privacy metrics of GDP. When , Rényi divergence is used as the privacy metric of GDP, which is a natural relaxation of GDP based on the Rényi divergence [11]. Chaudhuri et al. [25] used restricted divergences as privacy metric. When is Rényi divergence, the capacity bounded DP is a generalization of RDP. When is f-divergence, the capacity bounded DP is -DP in [8]. In [14,29], mutual information is used as the privacy metric of GDP, which is the amount of information leaked on X after observing Z. Cuff and Yu [13] also used -mutual-information as the privacy metric of GDP, which is the generalization of mutual information using Rényi divergence of order . Alvim et al. [7] used min-entropy leakage as the privacy metric of GDP, which is the ratio of the probabilities of right guessing a priori and a posterior. Furthermore, maximal leakage of channel is used as the privacy metric of GDP, which is the maximal reduction in uncertainty of X when Z is given [7,16]. According to the graph symmetrization, Edwards et al. [30] also regarded min-entropy leakage as an important measure of differential privacy loss of information channels under Blowfish privacy. Blowfish privacy is a generalization of global differential privacy. Rogers et al. [31] defined the privacy metric of GDP using max-information and -approximate max-information, which are a correlation measure allowing to bound the change in the conditional probability of an event relative to the prior probability. In [32,33], the privacy budget is directly used as privacy metric. Therefore, we can conclude that Rényi divergence is a more general privacy metric of GDP, since Rényi divergence is a generalization of restricted divergences and it can deduce f-divergence, min-entropy leakage, maximal leakage, and max-information. Moreover, mutual information can also be used as a privacy metric of GDP.
Table 7.
Privacy metrics of GDP under information-theoretic channel model.
Existing Work | Privacy Metric | Formula | Description | Bound |
---|---|---|---|---|
[16] | Maximal leakage | The maximal leakage of channel is the maximal reduction in uncertainty of X when Z is given, which is taken by maximizing over all input distributions of the attacker’s side information. | with spheres of radius d and center . | |
[7] | Min-entropy leakage | The min-entropy leakage corresponds to the ratio between the probabilities of attack success with a priori and a posterior. | with | |
Worst-case leakage | The same as maximal leakage above. | |||
[29] | Mutual information | The mutual information denotes the amount of information leaked on X given Z. | with , , and a constant | |
[30] | Min-entropy leakage | The same as above. | The same as above. | , where q is the number of connected components of induced adjacency graph, and is the diameter of the t-th connected component. |
[14] | Mutual information | The same as above. | – | |
[13] | -mutual-information | The notion of -mutual-information is the generalization of mutual information using Rényi information measures. | ||
[31] | Max-information | Maximum information is a correlation measure, similar to mutual information, which allows to bound the change of the conditional probability of an event relative to prior probability. |
and for -DP for -DP |
|
-approximate max-information | ||||
[11] | Rényi divergence | A natural relaxation of GDP based on the Rényi divergence. | – | |
[25] | -restricted divergences | The privacy loss is measured in terms of a divergence between output distributions of a mechanism on datasets that differ by a single record restricted to functions in . | ||
[32,33] | Privacy budget | The privacy budget represents the level of privacy preserving. | – |
We also summarize and analyze the information-theoretic utility metrics of GDP in Table 8. In the information-theoretic channel model of GDP, expected distortion is mainly the utility measurement method, which shows how much information about the real answer can be obtained from the reported answer to average [7,33]. Padakandla et al. [32] used fidelity as the utility metric, and the fidelity between transition probability distributions is measured by -distortion metric. Mutual information is not only used as a privacy metric but also as a utility metric of GDP, which captures the amount of information shared by two variables [33].
Table 8.
Utility metrics of GDP under information-theoretic channel model.
Existing Work | Utility Metric | Formula | Description | Bound |
---|---|---|---|---|
[7] | Expected distortion | How much information about the real answer can be obtained from the reported answer to average. | with | |
[14] | Expected distortion | The same as above. | – | |
[32] | Fidelity | The fidelity of a pair of transition probability distributions is -distortion metric. | – | |
[33] | Mutual information | Mutual information captures the amount of information shared by two variables, that is to say, quantifying how much information can be preserved when releasing a private view of the data. | – |
In Table 9, we summarize and analyze existing work of information-theoretic privacy metrics of LDP. In the information-theoretic channel model of LDP, Duchi et al. [17] defined the privacy metric of LDP using KL-divergence, which bounds the KL-divergence between distributions and by a quantity dependent on the privacy budget and gives the upper bound of KL-divergence by combining with the total variation distance between and of the initial distributions of the . Of course, mutual information can also be used as a privacy measure of LDP [34,35]. More generally, the existing work mainly uses the definition of the LDP as the privacy metric [5,36,37,38]. In [39], Lopuhaä-Zwakenberg et al. gave an average privacy metric based on the ratio of conditional entropy of sensitive information X.
Table 9.
Privacy metrics of LDP under information-theoretic channel model.
Existing Work | Privacy Metric | Formula | Description | Bound |
---|---|---|---|---|
[17] | KL-divergence | The general result bounds the KL-divergence between distributions and by the privacy budget and the total variation distance between and of the initial distributions of the X. | ||
[34,35] | Mutual information | The same as Table 7 above. | – | |
[4,5,37,38] | Privacy budget | The same as Table 7 above. | – | |
Average privacy [39] | Conditional entropy | Privacy metric is the fraction of sensitive information that is retained from the aggregator with prior knowledge P. | – |
Next, we summarize and analyze the information-theoretic utility metric of LDP in Table 10. In the information-theoretic channel model of LDP, f-divergence [5] and mutual information [5,36,38] can also be used as utility measures of LDP. In most cases, expected distortion is used as the utility measure of LDP [20,34,35,36,37]. In [39], Lopuhaä-Zwakenberg et al. presented distribution utility and tally utility metrics based on the ratio of relevant information.
Table 10.
Utility metrics of LDP under information-theoretic channel model.
Existing Work | Utility Metric | Formula | Description | Bound |
---|---|---|---|---|
[34,35,37] | Expected Hamming distortion | Hamming distortion measures the utility of a channel in terms of the worst-case Hamming distortion over source distribution . | – | |
[5] | f-divergence | f-divergence measures statistical discrimination between distributions and by the privacy budget and the total variation distance between and of the initial distributions of the X. | ||
Mutual information | The same as Table 8 above. | with for a given distribution P and partitioning X into two parties T and | ||
[36] | Expected distortion | A channel yields a small distortion between input and output sequences with respect to a given distortion measure. | – | |
Average error probability [20] | Expected Hamming distortion | The average error probability is defined to be the expected Hamming distortion between the input and output data based on maximum a posterior estimation. | ||
[38] | Mutual information | The same as Table 8 above. | } with | |
Distribution utility [39] | Mutual information | Utility metric is the fraction of relevant information after accessing to prior knowledge P or tally vector and . | – | |
Tally utility [39] | Entropy Mutual information |
6. Properties of GDP and LDP under Information-Theoretic Channel Models
In Table 11, we present and analyze the properties of GDP based on the information-theoretic channel model. According to the Rényi divergence, Mironov [11] demonstrated that the new definition shares many important properties with the standard definition of GDP, including post-processing, group privacy, and sequential composition. Considering -restricted divergences including Rényi divergence, Chaudhuri et al. [25] showed that capacity bounded DP has properties of post-processing, convexity, sequential composition, and parallel composition. Barthe and Köpf [16] proved the sequential composition and parallel composition of GDP based on maximal leakage under the information-theoretic channel model. Barthe and Olmedo [8] also proved the parallel composition of GDP using f-divergence. We know that Rényi divergence can deduce maximal leakage and max-divergence. f-divergence of Reference [8] is actually max-divergence. Thus, we can conclude that, such as post-processing, convexity, group privacy, and sequential composition, and parallel composition, the properties of GDP can be proved by using Rényi divergence.
Table 11.
Properties of GDP under information-theoretic channel model.
Existing Work | Privacy Type | Privacy Property | Information-Theoretic Metric | Formal Description |
---|---|---|---|---|
[16] | GDP | Sequential composition | Maximal leakage | for the sequential composition of channels and . When is -DP and is -DP, is -DP. |
Parallel composition | for the parallel composition of channels and . When is -DP and is -DP, is -DP. | |||
[8] | GDP | Sequential composition | f-divergence | , where , the same as Table 4 above. |
[11] | RDP | Post-processing | Rényi divergence | If there is a randomized mapping , then . |
Group privacy | If is -RDP, is -stable and , then is -RDP. | |||
Sequential composition | If is -RDP and is -RDP, then the mechanism satisfies -RDP. | |||
[25] | Capacity bounded DP | Post-processing | -restricted divergences | , , and are function classes such that for any and , . If mechanism is -capacity bounded DP with , then is also -capacity bounded DP with for any . |
Convexity | and are two mechanisms which have the same range and provide -capacity bounded DP with . If is a mechanism which executes mechanism with probability and with probability , then is -capacity bounded DP with . | |||
Sequential composition | is the function class . If and are and capacity bounded DP with and , respectively, then the combination is capacity bounded DP with . | |||
Parallel composition | is the function class . If and are and capacity bounded DP with and respectively, and the datasets and are disjoint, then the combination is capacity bounded DP with . | |||
[3] | GDP LDP |
Privacy-utility monotonicity | Mutual information | The mutual information decreases as the decreasing of the privacy budget, and vice versa |
Similarly, GDP and LDP share the above properties under the information-theoretic channel model. Therefore, LDP also has the properties of post-processing, convexity, group privacy, and sequential composition, and parallel composition.
Moreover, we have showed that GDP and LDP have privacy-utility monotonicity [3]. In GDP, -DP shows
(6) |
We can obtain
(7) |
When , we have
(8) |
We can obtain . We use mutual information as the utility metric. We can conclude that the mutual information of GDP decreases as the decreasing of the privacy budget, and vice versa. Privacy preserving is stronger and the utility is worse, and vice versa. Thus, GDP has privacy-utility monotonicity indicating the privacy-utility tradeoff. Similarly, we can observe that LDP also has privacy-utility monotonicity indicating the privacy-utility tradeoff.
7. GDP and LDP Mechanisms under Information-Theoretic Channel Models
In Table 12, we summarize and compare the GDP mechanisms from the perspective of information-theoretic channel on uniform distribution of the source X. Alvim et al. [7] maximized expected distortion under min-entropy leakage constraint and obtained the optimal randomization mechanism using graph symmetry caused by the adjacent relationship between adjacent datasets. The optimal randomization mechanism can ensure better utility while achieving -DP. According to the risk-distortion framework, Mir [14] minimized mutual information when the constraint condition is expected distortion and obtain -DP mechanism by Lagrangian multipliers method, where is a normalization function. GDP mechanism of [14] is corresponding to the exponential mechanism [40]. The conditional probability distribution minimizes the privacy leakage risk given a distortion constraint. Ayed et al. [33] maximized mutual information when constraint condition is DP and solved the constrained maximization program to obtain DP mapping under strongly symmetric channel.
Table 12.
GDP mechanisms under information-theoretic channel model.
Existing Work | Privacy Type | Model | Objective Function | Constraint Condition | Mechanism | Solution | Description | |||
---|---|---|---|---|---|---|---|---|---|---|
[7] | GDP | Maximal utility | Expected distortion | Min-entropy leakage | , where , , and c the same as Table 8 above. | Graph symmetry induced by the adjacent relationship of adjacent datasets. | Optimal randomization mechanism provides the better utility while guaranteeing -DP. | |||
[14] | GDP | Risk-distortion | Mutual information | Expected distortion | , where is a normalization function. | Lagrangian multipliers. | Conditional probability distribution is DP mapping, which minimizes the privacy risk given a distortion constraint. | |||
[33] | GDP | Constrained maximization program | Mutual information | GDP |
|
Definition of GDP. | When x is transformed into z and , the conditional transition probability is . When , the conditional transition probability is under strongly symmetric channel. |
In addition, Mironov [11] analyzed the RDP of three basic mechanisms and their self-composition, including randomized response, Laplace mechanism, and Gaussian mechanism, and gave the parameters of RDP of these mechanisms. Considering a linear adversary and unrestricted adversary, Chaudhuri et al. [25] also discussed the capacity bounded DP properties of Laplace mechanism, Gaussian mechanism, and matrix mechanism and presented the bound of privacy budget of Laplace mechanism and Gaussian mechanism under KL-divergence and Rényi divergence, respectively.
In Table 13, we summarize and compare the LDP mechanisms from the perspective of information-theoretic channel under uniform distribution of the source X. According to the rate-distortion function, References [34,35,37] maximized mutual information under expected Hamming distortion D constraint and obtained privacy budget for binary channel and privacy budget for discrete alphabets. Kairouz et al. [5] maximized KL-divergence and mutual information under LDP constraint and obtained binary randomized response mechanism, multivariate randomized response mechanism, and quaternary randomized response mechanism by solving the privacy-utility maximization problem, which is equivalent to solving the finite-dimensional linear program. Although Ayed et al. [33] maximized mutual information about GDP constraint, they also obtained binary randomized response mechanism and multivariate randomized response mechanism under a strongly symmetric channel. Wang et al. [38] maximized mutual information on LDP constraint and obtained the k-subset mechanism with respect to the uniform distribution on the source X. When , the 1-subset mechanism is the multivariate randomized response mechanism. When and , the multivariate randomized response mechanism is the binary randomized response mechanism. Xiong et al. [36] minimized privacy budget under expected distortion constraint, which is equivalent to solving a standard generalized linear-fractional program via the bisection method. However, Xiong et al. [36] did not give a specific expression of the optimal privacy channel .
Table 13.
LDP mechanisms under information-theoretic channel model.
Existing Work | Privacy Type | Model | Objective Function | Constraint Condition | Mechanism | Solution | Description | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
[34,35,37] | LDP | Rate-distortion function | Mutual information | Expected Hamming distortion | Binary channel | Memoryless symmetric channel. | LDP is just a function of the channel, and the worst-case Hamming distortion on source distribution measures the utility of a channel . | ||||
Discrete alphabet | |||||||||||
[5] | LDP | Constraint maximization problem | KL-divergence Mutual information | LDP | Binary randomized response |
|
Solving the privacy-utility maximization problem is equivalent to solving finite-dimensional linear program. | The binary and multivariate randomized response mechanisms are universally optimal in the low and high privacy regimes and well approximate the intermediate regime. The quaternary randomized mechanism satisfies -LDP. | |||
Multivariate randomized response |
|
||||||||||
Quaternary randomized response |
|
||||||||||
[38] | LDP | Maximize utility | Mutual information | LDP | k-subset mechanism |
|
This problem maximizes mutual information when x is a sample according to the uniform distribution with probability . | The mutual information bound is used as a universal statistical utility measurement, and the k-subset mechanism is the optimal -LDP mechanism. |
Furthermore, Duchi et al. [41] showed that randomized response is an optimal way to perform survey sampling while maintaining privacy of the respondents. Holohan et al. [42] proposed following optimal mechanism of randomized response satisfying -DP under uniform distribution of the source X, which is
(9) |
Erlingsson et al. [43] proposed randomized aggregatable privacy-preserving ordinal response (RAPPOR) by applying randomized response in a novel manner. RAPPOR provides privacy guarantee using permanent randomized response and instantaneous randomized response and ensures high-utility analysis of the collected data. RAPPOR encodes each value into a length-k binary bit vector B. For permanent randomized response, RAPPOR generates with the probability
(10) |
where . With respect to instantaneous randomized response, RAPPOR perturbs with the probability
(11) |
8. Differential Privacy Synthetic Data Generation
Data sharing facilitates training better models, decision making, and the reproducibility of scientific research. However, if the data are shared directly, it will face the risk of privacy leakage and the problem of small training sample size. Thus, synthetic data are often used to replace the sharing of real data. At present, one of the main methods for synthetic data generation is generative adversarial network [44]. GAN consists of two neural networks: one is a generator, and the other is a discriminator. The generator generates a realistic sample by inputting a noise obeying multivariable Gaussian distribution or uniform distribution. The discriminator is a binary classifier (such as 0–1 classifier) to judge whether the input sample is real or fake. In other words, the discriminator can distinguish whether each input sample comes from the real sample set or the fake sample set. However, the generator makes the ability of making samples as strong as possible so that the discriminator cannot judge whether the input sample is a real sample or a fake sample. According to this process, GAN can generate synthetic data to approximate the real data. Because the synthetic data accurately reflect the distribution of training data, it can avert privacy leakage by replacing real data sharing, augment small-scale training data, and be generated as desired. Thus, GAN can generate synthetic data for time series, continuous, and discrete data [45].
However, because the discriminator easily memorizes the training data, it brings the risk of privacy leakage [46]. Therefore, GAN mainly faces the privacy threat of membership inference attack and model extraction attack in Table 14. Hayes et al. [47] proposed a membership inference attack against the generative models, which means that the attacker can determine whether it is used to train the model given a data point. Liu et al. [48] proposed a new membership inference attack, co-membership inference attack, which checks whether the given n instances are in the training data, where the prior knowledge is completely used or not at all in the training. Hilprecht et al. [49] proposed a Monte Carlo attack on the membership inference against generative models, which yields high membership inference accuracy. Chen et al. [50] systematically analyzed the potential risk of privacy leakage caused by the generative models and proposed the classification of membership inference attacks, including not only the existing attacks but also the proposed generic attack model based on reconstruction. Hu and Pang [51] studied the model extraction attack against GAN by stealing the machine learning model whose purpose is to copy the machine learning model through query access to the target model. In order to mitigate the model extraction attack, Hu and Pang designed defenses based on input and output perturbation by perturbing latent code and generating samples, respectively.
Table 14.
Membership inference attack and model extraction attack against GAN.
Existing Work | Attack Target | Attack Type | Attack Method | Characteristic | Attack Effect |
---|---|---|---|---|---|
[47] | Generative models | Membership inference | The discriminator can learn the statistical difference of distribution, detect overfitting and recognize the input as part of the training dataset. | The proposed attack has low running cost, does not need information about the attacked model, and has good generalization. | Defenses are either ineffective or lead to a significant decline in the performance of the generative models in terms of training stability or sample quality. |
[48] | Generative models | Co-membership inference | The membership inference of the target data x is used as the optimization of the attacker’s network to search for potential codes to reproduce x, and the final reconstruction error is used to judge whether x is in the training data. | When the generative models are trained with large datasets, the co-membership inference attack is necessary to achieve success. | The performance of attacker’s network is better than that of previous membership attacks, and the power of co-membership attack is much greater than that of a single attack. |
[49] | Generative models | Membership inference | The membership inference attack based on Monte Carlo integration only considers the small distance samples in the model. | This attack allows membership inference without assuming the type of generative models. | The success rate of this attack is better than that of previous studies on most datasets, and there are only very mild assumptions. |
[50] | Generative models | Membership inference | This work proposed a general attack model based on reconstruction for which the model is suitable for all settings according to the attacker’s knowledge about the victim model. | This work provides a theoretically reliable attack calibration technology, which can continuously improve the attack performance in different attack settings, data modes, and training configurations in all cases. | This attack reveals the information of the training data used for the victim model. |
[51] | GAN | Model extraction | This work studied the model extraction attack based on target and background knowledge from the perspectives of fidelity extraction and accuracy extraction. | Model extraction based on transfer learning can enable adversaries to improve the performance of their GAN model through transfer learning. | Attack model stealing the most advanced target model can be transferred to new fields to expand the application scope of extraction model. |
However, the existing work mainly achieves the model protection of neural network based on differential privacy. By using the norm of the gradient and the clipping threshold to clip the gradient, and using the Gaussian mechanism to randomly perturb the clipping gradient, Abadi et al. [52] proposed differential privacy stochastic gradient decent (DP-SGD) to protect the privacy of training data during the training process and demonstrated the moment accountant of the privacy loss that provides a tighter bound on the privacy loss compared to the generic strong composition theorem of differential privacy [9].
Next, in Table 15 and Table 16, we mainly review the work of synthetic data generation based on differential privacy GAN and differential privacy GAN with federated learning from the following aspects: gradient perturbation, weight perturbation, data perturbation, label perturbation, and objective function perturbation. Thus, our work is different from the existing surveys [53,54].
Table 15.
Differential privacy synthetic data generation with GAN.
Existing Work | GAN Type | Clipping Strategy | Perturbation Strategy | Privacy Loss Accountant |
---|---|---|---|---|
[55] | GAN | Clipping gradient | Gradient perturbation | Moment accountant |
[56] | WGAN | Clipping weight | Gradient perturbation | Moment accountant |
[45] | GAN | Clipping gradient | Gradient perturbation | Moment accountant |
[58] | CGAN | Clipping gradient | Gradient perturbation | RDP accountant |
[60] | GAN | Clipping gradient | Gradient perturbation | Moment accountant |
[61] | GAN | Clipping gradient | Gradient perturbation | Moment accountant |
[62] | WGAN | Clipping gradient | Gradient perturbation | RDP accountant |
[63] | WGAN-GP | Clipping gradient | Gradient perturbation | Moment accountant |
[65] | AC-GAN | Clipping gradient | Gradient perturbation | Moment accountant |
[67] | GAN | Clipping gradient | Gradient perturbation | Moment accountant |
[68] | NetGAN | Clipping gradient | Gradient perturbation | Privacy budget composition [9] |
[70] | GAN | – | Data perturbation | – |
[71] | GAN | – | Data perturbation | Advanced composition [9] |
[73] | GAN | – | Data perturbation | – |
[74] | GAN | – | Data perturbation | – |
[75] | GAN | – | Label perturbation | Moment accountant |
[76] | GAN | – | Objective function perturbation | Advanced composition |
[77] | GAN | – | Differential privacy identifier | Privacy budget composition |
Table 16.
Differential privacy synthetic data generation with federated learning.
Existing Work | GAN Type | Clipping Strategy | Perturbation Strategy | Privacy Loss Accountant | Training Method |
---|---|---|---|---|---|
[81] | GAN | Clipping weight | Weight perturbation | RDP accountant | FedAvg algorithm |
[62] | WGAN | Clipping gradient | Gradient perturbation | RDP accountant | FedAvg algorithm |
[82] | GAN | Clipping weight | Gradient perturbation | Moment accountant | FedAvg algorithm |
[83] | GAN | – | Gradient perturbation | – | FedAvg algorithm |
[84] | GAN | Clipping gradient | Gradient perturbation | RDP accountant | Serial training |
[85] | GAN | – | Differential average-case privacy | – | FedAvg algorithm |
8.1. Differential Privacy Synthetic Data Generation with Generative Adversarial Network
Because the discriminator of GAN can easily remember the training samples, training GAN with sensitive or private data samples breaches the privacy of the training data. Thus, using gradient perturbation can protect the privacy of the sensitive training data by training GAN models with differential privacy based on DP-SGD. Existing work protects the privacy of the training dataset by adding carefully designed noise to clipping gradients during the learning procedure of discriminator and uses moment accountant or RDP accountant to better keep track of the privacy cost for improving the quality of synthetic data. RDP accountant [11] provides a tighter bound for privacy loss in comparison with the moment accountant. In gradient perturbation, clipping strategy and perturbation strategy improve the performance of the model while preserving privacy of the training dataset.
Using gradient perturbation, Lu and Yu [55] proposed a unified framework for publishing differential privacy data based on GAN, such as tabular data and graphs, and synthetic data with acceptable utility in differential privacy manner. Xie et al. [56] proposed a differential privacy Wasserstein GAN (WGAN) [57] model, which adds carefully designed noise to the clipping gradient in the learning process, generates high-quality data points at a reasonable privacy level, and uses moment accountant to ensure the privacy in the iterative gradient descent process. Frigerio et al. [45] developed a differential privacy framework for privacy protection data publishing using GAN, which can easily adapt to the generation of continuous, time series, and discrete data and maintain the original distribution of features and the correlation between them at a good level of privacy. Torkzadehmahani et al. [58] introduced a differential privacy condition GAN (CGAN) [59] training framework based on clipping and perturbation strategy, which generates synthetic data and corresponding labels while preserving the privacy of training datasets and uses RDP accountant to track the privacy budget of expenses. Liu et al. [60] proposed a GAN model for privacy protection, which achieves differential privacy by adding carefully designed noise to the clipping gradient in the process of model learning, uses the moment accountant strategy to improve the stability and compatibility of the model by controlling the loss of privacy, and generates high-quality synthetic data while retaining the required available data under a reasonable privacy budget. Ha and Dang [61] proposed a local differential privacy GAN model for noise data generation, which establishes a generative model by clipping the gradient in the model and adding Gaussian noise to the gradient to ensure the differential privacy. Chen et al. [62] proposed gradient-sanitized WGAN, which allows the publication of sanitized sensitive data under strict privacy guarantee and can more accurately distort gradient information so as to train deeper models and generate more information samples. Yang et al. [63] proposed a differential privacy gradient penalty WGAN (WGAN-GP) [64] to train a generative model with privacy protection function, which can provide strong privacy protection for sensitive data and generate high-quality synthetic data. Beaulieu-Jones et al. [65] used the auxiliary classifier GAN (AC-GAN) [66] with different privacy to generate simulated synthetic participants very similar to Systolic Blood Pressure Trial participants, which can generate synthetic participants and promote secondary analysis and repeatability investigation of clinical datasets by strengthening data sharing and protecting participants’ privacy. Fan and Pokkunuru [67] proposed a differential privacy solution for generating high-quality synthetic network flow data, which uses new clipping bound decay and privacy model selection to improve the quality of synthetic data and protects the privacy of sensitive training data by training GAN model with differential privacy. Zhang et al. [68] proposed a privacy publishing model based on GAN for graphs (NetGAN) [69], which can maintain high data utility in degree distribution and satisfy -differential privacy.
Data perturbation can achieve privacy preserving by adding differential privacy noise to training data when using GAN generated synthetic data. Li et al. [70] proposed a graph data privacy protection method using GAN to perform an anonymization operation on graph data, which makes it possible to fully learn the characteristics of graph without specifying specific features and ensures the privacy performance of anonymous graph by adding differential privacy noise to the probability adjacency matrix in the process of graph generation. Neunhoeffer et al. [71] proposed differential privacy post-GAN boosting, which combines the samples produced by the generator sequence obtained during GAN training to create a high-quality synthetic dataset and reweights the generated samples using the private multiplication weight method [72]. Indhumathi and Devi [73] proposed healthcare Cramér GAN, which only adds differential privacy noise to the identified quasi identifiers, and the final result is combined with sensitive attributes, where the anonymous medical data are used as the real data for training Cramér GAN, Cramér distance is used to improve the efficiency of the model, and the synthetic data generation by health care GAN can provide high privacy and overcome various attacks. Imtiaz et al. [74] proposed a GAN combined with differential privacy mechanism to generate a real privacy smart health care dataset by directly adding noise to the aggregated data record, which can generate high-quality synthetic and differential privacy datasets and retain the statistical characteristics of the original dataset.
By using label perturbation of differential privacy noise, Papernot et al. [78] constructed the private aggregation of teacher ensembles (PATE), which provides a strong privacy guarantee for training data. The mechanism combines multiple models trained by disjoint datasets in a black box way. Because these models rely directly on sensitive data, they are not published but used as “teacher” of the “student” model. Because Laplace noise will only add the output of teachers, the students can learn to predict the output chosen by Laplace noisy voting among all teachers and cannot directly access a single teacher, basic data, or parameters. PATE uses moment accountant to better track privacy costs. Building on the GAN and PATE frameworks, Jordon et al. [75] replaced the GAN discriminator with the PATE mechanism. Therefore, the discriminator satisfies differential privacy, needing a differentiated student version to allow back propagation to the generator. However, this mechanism requires the use of public data.
In objective function perturbation, existing work injects Laplace noise into the coefficients to construct differentially private loss function in GAN training. Zhang et al. [76] proposed a new privacy protection GAN, which perturbs the coefficients of the objective function by injecting Laplace noise into the latent space based on the function mechanism to ensure the differential privacy of the training data, and it is reliable to generate high-quality real synthetic data samples without divulging the sensitive information in the training dataset.
In addition, the current research mainly focuses on publishing privacy-preserving data in a statistical way rather than considering the dynamics and correlation of the context. Thus, on the basis of triple-GAN [79], Ho et al. [77] proposed a generative adversarial game framework with three players based on triple-GAN, which designed a new perceptron, namely differential privacy identifier, to enhance synthetic data in the way of differential privacy. This deep generative model can generate synthetic data while fulfilling the differential privacy constraint.
8.2. Differential Privacy Synthetic Data Generation with Federated Learning
In order to achieve distributed collaborative data analysis, collecting large-scale data is an important task. However, due to the privacy of sensitive data, it is difficult to collect enough samples. Therefore, using GAN can generate synthetic data that can be shared for data analysis. However, in the distributed setting, training GAN faces new challenges of data privacy. Therefore, the existing work provides a solution for differential privacy synthetic data collection by combining GAN and federated learning in a distributed setting. According to the FedAvg training algorithm of model aggregation and averaging, federated learning is achieved by coordinating distributed data with independent and identically distributed and non-IID to perform collaborative learning [80].
Similar to the idea of gradient perturbation, using weight perturbation can achieve differential privacy of a generative model by clipping weight and adding noise to weight in GAN training with federated learning. Machine learning modeler workflow relies on data checking, so it is excluded when direct checking is impossible in the private and decentralized data paradigm. In order to overcome this limitation, Augenstein et al. [81] proposed a differential privacy algorithm, which synthesizes examples representing private data by adding Gaussian noise to the weighted average update.
Gradient perturbation can also be used to ensure the privacy protection of training data in GAN training with federated learning. Chen et al. [62] extended the gradient-sanitized WGAN to train GAN with differential privacy in federated setting and remarked some subtle differences between their method and the method of [81]. Different hospitals jointly train the model through data sharing to diagnose COVID-19 pneumonia, which will also lead to privacy disclosure. In order to solve this problem, Zhang et al. [82] proposed a federated differential privacy GAN for detecting COVID-19 pneumonia, which can effectively diagnose COVID-19 without compromising the privacy under IID and non-IID settings. The distributed storage of data and the fact that data cannot be shared due to privacy reasons for the federal learning environment bringing new challenges to training GAN. Thus, Nguyen et al. [83] proposed a new federated learning scheme to generate realistic COVID-19 images for facilitating enhanced COVID-19 detection with GAN in edge cloud computing, and this scheme integrates a differential privacy solution at each hospital institution to enhance the privacy in federated COVID-19 data analytics. By adding Gaussian noise to the gradient update process of the discriminator, Xin et al. [84] proposed a differential privacy GAN based on federated learning by strategically combining Lipschitz condition and differential privacy sensitivity, which uses a serialized model-training paradigm to significantly reduce the communication cost. Considering that distributed data are often non-IID in reality, which brings challenges to modeling, Xin et al. further proposed universal private FL-GAN to solve this problem. These algorithms can provide strict privacy guarantee using different privacy, but they can also generate satisfactory data and protect the privacy of training data, even if the data is non-IID.
Furthermore, considering differential average-case privacy [18] enhancing privacy protection of federated learning, Triastcyn and Faltings [85] proposed a privacy protection data publishing framework using GAN in the federated learning environment for which the generator component is trained by the FedAvg algorithm to draw private artificial data samples and empirically evaluate the risk of information disclosure. It can generate high-quality labeled data to successfully train and verify the supervision model, significantly reducing the vulnerability of such models to model inversion attacks.
9. Open Problems
We survey that the current work focuses on the definitions, privacy-utility metrics, properties, and achieving mechanisms of GDP and LDP based on the information-theoretic channel model. Mir [14] obtained the exponential mechanism achieving GDP by minimizing mutual information on the expected distortion constraint. We can intuitively obtain binary randomized response mechanism, quaternary randomized response mechanism, and multivariate randomized response mechanism under the binary symmetric channel, quasi-symmetric channel, and strongly symmetric channel, respectively, in terms of the Equation (5) of the LDP definition. Wang et al. [38] obtained the k-subset mechanism by maximizing mutual information about LDP constraint. Although GDP and LDP have been studied based on the information-theoretic channel model, there are some open problems for different application scenarios and data types from the perspective of different types of information-theoretic channel in Table 17.
Table 17.
Open problems of GDP and LDP from the perspective of different types of information-theoretic channel.
Scenario | Data Type | Privacy Type | Open Problem | Method | Information-Theoretic Foundation |
---|---|---|---|---|---|
Data collection | Categorical data | LDP | Personalized privacy demands | Rate-distortion framework | Discrete single symbol information-theoretic channel |
Poor data utility | |||||
Information-theoretic analysis of existing LDP mechanisms | |||||
High-dimensional (correlated) data collection | Categorical data | LDP | Poor data utility | Rate-distortion framework Joint probability Markov chain | Discrete sequence information-theoretic channel |
Continuous (correlated) data releasing | Numerical data | GDP | Information-theoretic analysis of existing GDP mechanisms | Rate-distortion framework Joint probability Markov chain | Continuous information-theoretic channel |
RDP mechanisms | |||||
Personalized privacy demands | |||||
Poor data utility | |||||
Multiuser (correlated) data collection | Numerical data Categorical data | GDP LDP | Privacy leakage risk | Rate-distortion framework | Multiple access channel Multiuser channel with correlated sources |
Multi-party data releasing | Broadcast channel | ||||
Synthetic data generation | Numerical data Categorical data | GDP LDP | Poor data utility | GAN GAN with federated learning | Information-theoretic metrics |
(1) New LDP from the perspective of information-theoretic channel. Because local users have different privacy preferences, Yang et al. [86] proposed personalized LDP. However, it is necessary to study personalized LDP from the perspective of information-theoretic channel and propose the corresponding achieving mechanism. Although LDP does not require a trusted third party, it regards all local data equally sensitive, which causes excessive protection resulting in utility disaster [87]. Thus, it is necessary to study the utility-optimized mechanism for the setting where all users use the same random perturbation mechanism. In addition, since the differences between sensitive and non-sensitive data vary from user to user, it needs to propose a personalized utility-optimized mechanism of individual data achieving high utility while maintaining privacy preserving of sensitive data. Holohan et al. [42] proposed optimal mechanism satisfying -LDP for randomized response. The optimal mechanism of the randomized response needs to be analyzed and obtained from the perspective of information-theoretic channel. Moreover, a new LDP mechanism needs to be analyzed by using the average error probability [20] as the utility metric under the rate-distortion framework of LDP.
(2) LDP from the perspective of discrete sequence information-theoretic channel. Collecting multiuser high-dimensional data can produce rich knowledge. However, this brings unprecedented privacy concerns to the participants [88,89]. In view of the privacy leakage risk of high-dimensional data aggregation, using the existing LDP mechanism brings poor data utility. Thus, it is necessary to study the optimal LDP mechanism of aggregating high-dimensional data from the perspective of discrete sequence information-theoretic channel. Furthermore, correlations exist between various attributes of high-dimensional data. If the correlation is not modeled, then the high-dimensional correlated data using LDP also leads to poor data utility [90,91]. By constructing the discrete sequence information-theoretic channel model of high-dimensional correlated data aggregation using LDP under joint probability or Markov chain, a LDP mechanism suitable for high-dimensional correlated data aggregation needs to be provided.
(3) GDP from the perspective of continuous information-theoretic channel. For GDP, there is no work to show the direct relationship between GDP mechanisms and single symbol continuous information-theoretic channel model, such as Laplace mechanism, discrete Laplace mechanism, and Gaussian mechanism. RDP is a general privacy definition, but existing work did not provide RDP mechanisms under continuous information-theoretic channel model. Thus, RDP mechanisms need to be studied from the perspective of continuous information-theoretic channel. The continuous releasing of correlated data and their statistics has the potential for significant social benefits. However, privacy concerns hinder the wider use of these continuous correlated data [92,93]. Therefore, the corresponding GDP mechanism from the perspective of continuous multi-symbol information-theoretic channel needs to be studied by combining the joint probability or Markov chain for continuous correlated data releasing with DP. However, it is common that the data curators have different privacy preferences with their data. Thus, personalized DP [94] needs to be studied based on continuous information-theoretic channel model. Existing GDP mechanisms ignore the characteristics of data and directly perturb the data or query results, which will inevitably lead to poor data utility. Therefore, it is necessary to study adaptive GDP depending on characteristics of data [95] from the perspective of continuous information-theoretic channel. Since users have different privacy demands, aggregate data analysis with DP also has poor data utility. Thus, adaptive personalized DP [96] also needs to be studied based on the type of query function, data distribution, and privacy settings from the perspective of continuous information-theoretic channel.
(4) GDP and LDP from the perspective of multiuser information-theoretic channel. A large amount of individual data have aggregated for computing various statistics, query responses, classifiers, and other functions. However, these processes will release sensitive information compromising individual privacy [97,98,99,100]. Thus, when considering the aggregation of multiuser data, the GDP and LDP mechanisms need to be studied from the multiple access channel. Data collection of GDP and LDP has been mostly studied for homogeneous and independently distributed data. In real-world applications, data have an inherent correlation which without harnessing will lead to poor data utility [101,102]. Thus, when the multiuser data are correlated, the GDP and LDP mechanisms need to be studied from the perspective of the multiuser channel with correlated sources. With the acceleration of digitization, more and more high-dimensional data are collected and used for different purposes. When these distributed data are aggregated, they can become valuable resources to support better decision making or provide high-quality services. However, because the data held by each party may contain highly sensitive information, simply integrating local data and sharing the aggregation results will pose a serious threat to individual privacy [103,104]. Therefore, GDP and LDP mechanisms need to be studied from the perspective of the broadcast channel for data releasing and sharing of multi-party data.
(5) Adaptive differential privacy with GAN. Existing work can generate differential privacy synthetic data using GAN. However, because of the differential privacy noise introduced in the training, the convergence of GAN becomes even more difficult and leads to the poor utility of output generator at the end of training. Therefore, it is necessary to explore adaptive differential privacy synthetic data using GAN to generate high-quality synthetic data according to the real data distribution. Combining differential privacy definition and information-theoretic metrics, a new differential privacy loss function model of GAN needed to be studied, and the differential privacy loss function model meets the convergence and reaches the optimal solution. Based on differential privacy loss function model, it is needed to construct adaptive differential privacy model. Using GAN and its variants generates synthetic data under adaptive differential privacy model. To improve the quality of the synthetic data using adaptive differential privacy model, GAN modeling is achieved by more layers, more complex structures, or transfer learning. Moreover, speed of GAN training can be accelerated by reducing the privacy budget. To resolve mode collapse and non-convergence issues, it is necessary to conduct fine tuning of hyper parameters, such as learning rate and number of discriminator epochs. Furthermore, the proposed adaptive differential privacy model with GAN should be extended to a distributed setting by using federated learning, which explores data augmentation methods which can improve the non-IID problem.
10. Conclusions
This survey has compared and analyzed the GDP and LDP from the perspective of information-theoretic channel. We concluded that the one-try attack with prior knowledge brings privacy concerns under information-theoretic channel. We described and compared the information-theoretic channel models of GDP and LDP for different data types. We summarized and compared the information-theoretic definitions of GDP and LDP under their information-theoretic channel models and presented the unified information-theoretic definitions of GDP and LDP, respectively. We also made a comparative analysis between GDP (LDP) and other information-theoretic privacy definitions. We surveyed and compared the privacy-utility metrics, properties, and achieving mechanisms of GDP and LDP from the perspective of information-theoretic channel. Moreover, we reviewed the differential privacy synthetic data generation using GAN and GAN with federated learning, respectively. Considering the problem of privacy threat to different real-world applications of different data types, we discussed the open problems from the perspective of different types of information-theoretic channel. We want that the survey can serve as a tutorial for the reader grasping GDP and LDP based on the information-theoretic channel model, and our survey can provide a reference to the reader to conduct in-depth research on GDP and LDP based on different types of information-theoretic channel models.
Author Contributions
H.L. wrote the paper and contributed to the review and analysis; H.L., C.P., Y.T., S.L., F.T. and Z.W. collaboratively discussed the results; H.L., C.P., Y.T., S.L., F.T. and Z.W. collaboratively checked the English writing and organization of the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 62002081, Grant 62062020, Grant U1836205, and Grant 61602290, in part by the Project Funded by China Postdoctoral Science Foundation under Grant 2019M663907XB, in part by the Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2022JM-329, in part by the Fundamental Research Funds for the Central Universities under Grant GK202103090, in part by the Major Scientific and Technological Special Project of Guizhou Province under Grant 20183001, and in part by the Foundation of Guizhou Provincial Key Laboratory of Public Big Data under Grant 2018BDKFJJ004.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Footnotes
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Dwork C., McSherry F., Nissim K., Smith A. Calibrating noise to sensitivity in private data analysis; Proceedings of the 3rd Theory of Cryptography Conference; New York, NY, USA. 4–7 March 2006; pp. 265–284. [Google Scholar]
- 2.Kasiviswanathan S.P., Lee H.K., Nissim K., Raskhodnikova S., Smith A. What can we learn privately? SIAM J. Comput. 2011;40:793–826. doi: 10.1137/090756090. [DOI] [Google Scholar]
- 3.Liu H., Wu Z., Peng C., Tian F., Lu L. Bounded privacy-utility monotonicity indicating bounded tradeoff of differential privacy mechanisms. Theor. Comput. Sci. 2020;816:195–220. doi: 10.1016/j.tcs.2020.02.004. [DOI] [Google Scholar]
- 4.Dobrota B. Master Thesis. Utrecht University; Utrecht, The Netherlands: 2021. Measuring the Quantity of Data Privacy and Utility Tradeoff for Users’ Data: A Visualization Approach. [Google Scholar]
- 5.Kairouz P., Oh S., Viswanath P. Extremal mechanisms for local differential privacy. J. Mach. Learn. Res. 2016;4:492–542. [Google Scholar]
- 6.Cover T.M., Thomas J.A. Elements of Information Theory. 2nd ed. John Wiley & Sons Inc.; Hoboken, NJ, USA: 2006. [Google Scholar]
- 7.Alvim M.S., Andrés M.E., Chatzikokolakis K., Degano P., Palamidessi C. Differential privacy: On the trade-off between utility and information leakage; Proceedings of the 8th International Workshop on Formal Aspects of Security and Trust; Leuven, Belgium. 12–14 September 2011; pp. 39–54. [Google Scholar]
- 8.Barthe G., Olmedo F. Beyond differential privacy: Composition theorems and relational logic for f-divergences between probabilistic programs; Proceedings of the 40th International Colloquium Automata, Languages, and Programming; Riga, Latvia. 8–12 July 2013; pp. 49–60. [Google Scholar]
- 9.Dwork C., Roth A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 2014;9:211–407. doi: 10.1561/0400000042. [DOI] [Google Scholar]
- 10.Fehr S., Berens S. On the conditional Rényi entropy. IEEE Trans. Inf. Theory. 2014;60:6801–6810. doi: 10.1109/TIT.2014.2357799. [DOI] [Google Scholar]
- 11.Mironov I. Rényi differential privacy; Proceedings of the 30th IEEE Computer Security Foundations Symposium; Santa Barbara, CA, USA. 21–25 August 2017; pp. 263–275. [Google Scholar]
- 12.Wang W., Ying L., Zhang J. On the relation between identifiability, differential privacy, and mutual-information privacy. IEEE Trans. Inf. Theory. 2016;62:5018–5029. doi: 10.1109/TIT.2016.2584610. [DOI] [Google Scholar]
- 13.Cuff P., Yu L. Differential privacy as a mutual information constraint; Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security; Vienna, Austria. 24–28 October 2016; pp. 43–54. [Google Scholar]
- 14.Mir D.J. Information-theoretic foundations of differential privacy; Proceedings of the 5th International Symposium on Foundations and Practice of Security; Montreal, QC, Canada. 25–26 October 2012; pp. 374–381. [Google Scholar]
- 15.Smith G. On the foundations of quantitative information flow; Proceedings of the 12th International Conference on Foundations of Software Science and Computational Structures; York, UK. 22–29 March 2009; pp. 288–302. [Google Scholar]
- 16.Barthe G., Köpf B. Information-theoretic bounds for differentially private mechanisms; Proceedings of the 24th IEEE Computer Security Foundations Symposium; Cernay-la-Ville, France. 27–29 June 2011; pp. 191–204. [Google Scholar]
- 17.Duchi J.C., Jordan M.I., Wainwright M.J. Local privacy and statistical minimax rates; Proceedings of the 54th Annual IEEE Symposium on Foundations of Computer Science; Berkeley, CA, USA. 26–29 October 2013; pp. 429–438. [Google Scholar]
- 18.Duchi J.C., Jordan M.I., Wainwright M.J. Privacy aware learning. J. ACM. 2014;61:38:1–38:57. doi: 10.1145/2666468. [DOI] [Google Scholar]
- 19.Jiang B., Li M., Tandon R. Context-aware data aggregation with localized information privacy; Proceedings of the 6th IEEE Conference on Communications and Network Security; Beijing, China. 30 May–1 June 2018; pp. 1–9. [Google Scholar]
- 20.Song H., Luo T., Li J. Common criterion of privacy metrics and parameters analysis based on error probability for randomized response. IEEE Access. 2019;7:16964–16978. doi: 10.1109/ACCESS.2019.2893407. [DOI] [Google Scholar]
- 21.Lopuhaä-Zwakenberg M., Tong H., Skoric B. Data sanitisation protocols for the privacy funnel with differential privacy guarantees. arXiv. 20202008.13151 [Google Scholar]
- 22.Lopuhaä-Zwakenberg M., Goseling J. The privacy-utility tradeoff of robust local differential privacy. arXiv. 2021 doi: 10.3390/e26030233.2101.09139 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bun M., Steinke T. Concentrated differential privacy: Simplifications, extensions, and lower bounds; Proceedings of the 14th Theory of Cryptography Conference; Beijing, China. 31 October–3 November 2016; pp. 635–658. [Google Scholar]
- 24.Asoodeh S., Liao J., Calmon F.P., Kosut O., Sankar L. Three variants of differential privacy: Lossless conversion and applications. IEEE J. Sel. Areas Inf. Theory. 2021;2:208–222. doi: 10.1109/JSAIT.2021.3054692. [DOI] [Google Scholar]
- 25.Chaudhuri K., Imola J., Machanavajjhala A. Capacity bounded differential privacy; Proceedings of the 32nd Annual Conference on Neural Information Processing Systems; Vancouver, BC, Canada. 8–14 December 2019; pp. 3469–3478. [Google Scholar]
- 26.Calmon F.P., Fawaz N. Privacy against statistical inference; Proceedings of the 50th Annual Allerton Conference on Communication, Control, and Computing; Allerton Park & Retreat Center, Monticello, IL, USA. 1–5 October 2012; pp. 1401–1408. [Google Scholar]
- 27.Makhdoumi A., Fawaz N. Privacy-utility tradeoff under statistical uncertainty; Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing; Allerton Park & Retreat Center, Monticello, IL, USA. 2–4 October 2013; pp. 1627–1634. [Google Scholar]
- 28.Holohan N., Leith D.J., Mason O. Extreme points of the local differential privacy polytope. Linear Algebra Appl. 2017;534:78–96. doi: 10.1016/j.laa.2017.08.011. [DOI] [Google Scholar]
- 29.De A. Lower bounds in differential privacy; Proceedings of the 9th Theory of Cryptography Conference; Taormina, Sicily, Italy. 19–21 March 2012; pp. 321–338. [Google Scholar]
- 30.Edwards T., Rubinstein B.I.P., Zhang Z., Zhou S. A graph symmetrization bound on channel information leakage under Blowfish privacy. IEEE Trans. Inf. Theory. 2022;68:538–548. doi: 10.1109/TIT.2021.3120371. [DOI] [Google Scholar]
- 31.Rogers R.M., Roth A., Smith A.D., Thakkar O. Max-information, differential privacy, and post-selection hypothesis testing; Proceedings of the 57th Annual Symposium on Foundations of Computer Science; Hyatt Regency, New Brunswick, NJ, USA. 9–11 October 2016; pp. 487–494. [Google Scholar]
- 32.Padakandla A., Kumar P.R., Szpankowski W. The trade-off between privacy and fidelity via Ehrhart theory. IEEE Trans. Inf. Theory. 2020;66:2549–2569. doi: 10.1109/TIT.2019.2959976. [DOI] [Google Scholar]
- 33.Ayed F., Battiston M., Camerlenghi F. An information theoretic approach to post randomization methods under differential privacy. Stat. Comput. 2020;30:1347–1361. doi: 10.1007/s11222-020-09949-3. [DOI] [Google Scholar]
- 34.Sarwate A.D., Sankar L. A rate-disortion perspective on local differential privacy; Proceedings of the 52nd Annual Allerton Conference on Communication, Control, and Computing; Allerton Park & Retreat Center, Monticello, IL, USA. 30 September–3 October 2014; pp. 903–908. [Google Scholar]
- 35.Kalantari K., Sankar L., Sarwate A.D. Robust privacy-utility tradeoffs under differential privacy and Hamming distortion. IEEE Trans. Inf. Forensics Secur. 2018;13:2816–2830. doi: 10.1109/TIFS.2018.2831619. [DOI] [Google Scholar]
- 36.Xiong S., Sarwate A.D., Mandayam N.B. Randomized requantization with local differential privacy; Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing; Shanghai, China. 20–25 March 2016; pp. 2189–2193. [Google Scholar]
- 37.Kalantari K., Sankar L., Sarwate A.D. Optimal differential privacy mechanisms under Hamming distortion for structured source classes; Proceedings of the IEEE International Symposium on Information Theory; Barcelona, Spain. 10–15 July 2016; pp. 2069–2073. [Google Scholar]
- 38.Wang S., Huang L., Nie Y., Zhang X., Wang P., Xu H., Yang W. Local differential private data aggregation for discrete distribution estimation. IEEE Trans. Parallel Distrib. Syst. 2019;30:2046–2059. doi: 10.1109/TPDS.2019.2899097. [DOI] [Google Scholar]
- 39.Lopuhaä-Zwakenberg M., Skoric B., Li N. Information-theoretic metrics for local differential privacy protocols. arXiv. 20191910.07826 [Google Scholar]
- 40.McSherry F., Talwar K. Mechanism design via differential privacy; Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science; Providence, RI, USA. 20–23 October 2007; pp. 94–103. [Google Scholar]
- 41.Duchi J.C., Wainwright M.J., Jordan M.I. Local privacy and minimax bounds: Sharp rates for probability estimation; Proceedings of the 27th Annual Conference on Neural Information Processing Systems; Lake Tahoe, NV, USA. 5–8 December 2013; pp. 1529–1537. [Google Scholar]
- 42.Holohan N., Leith D.J., Mason O. Optimal differentially private mechanisms for randomised response. IEEE Trans. Inf. Forensics Secur. 2017;12:2726–2735. doi: 10.1109/TIFS.2017.2718487. [DOI] [Google Scholar]
- 43.Erlingsson Ú., Pihur V., Korolova A. RAPPOR: Randomized aggregatable privacy-preserving ordinal response; Proceedings of the 21st ACM SIGSAC Conference on Computer and Communications Security; Scottsdale, AZ, USA. 3–7 November 2014; pp. 1054–1067. [Google Scholar]
- 44.Goodfellow I.J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A.C., Bengio Y. Generative adversarial nets; Proceedings of the 28th Annual Conference on Neural Information Processing Systems; Montreal, QC, Canada. 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- 45.Frigerio L., de Oliveira A.S., Gomez L., Duverger P. Differentially private generative adversarial networks for time series, continuous, and discrete open data; Proceedings of the 34th IFIP TC 11 International Conference on ICT Systems Security and Privacy Protection; Lisbon, Portugal. 25–27 June 2019; pp. 151–164. [Google Scholar]
- 46.Mukherjee S., Xu Y., Trivedi A., Patowary N., Ferres J.L. privGAN: Protecting GANs from membership inference attacks at low cost to utility. Proc. Priv. Enhancing Technol. 2021;3:142–163. doi: 10.2478/popets-2021-0041. [DOI] [Google Scholar]
- 47.Hayes J., Melis L., Danezis G., De Cristofaro E. LOGAN: Membership inference attacks against generative models. Proc. Priv. Enhancing Technol. 2019;1:133–152. doi: 10.2478/popets-2019-0008. [DOI] [Google Scholar]
- 48.Liu K.S., Xiao C., Li B., Gao J. Performing co-membership attacks against deep generative models; Proceedings of the 19th IEEE International Conference on Data Mining; Beijing, China. 8–11 November 2019; pp. 459–467. [Google Scholar]
- 49.Hilprecht B., Härterich M., Bernau D. Monte Carlo and reconstruction membership inference attacks against generative models. Proc. Priv. Enhancing Technol. 2019;4:232–249. doi: 10.2478/popets-2019-0067. [DOI] [Google Scholar]
- 50.Chen D., Yu N., Zhang Y., Fritz M. GAN-leaks: A taxonomy of membership inference attacks against generative models; Proceedings of the 27th ACM SIGSAC Conference on Computer and Communications Security; Virtual Event USA. 9–13 November 2020; pp. 343–362. [Google Scholar]
- 51.Hu H., Pang J. Stealing machine learning models: Attacks and countermeasures for generative adversarial networks; Proceedings of the 37th Annual Computer Security Applications Conference; Virtual Event USA. 6–10 December 2021; pp. 1–16. [Google Scholar]
- 52.Abadi M., Chu A., Goodfellow I.J., McMahan H.B., Mironov I., Talwar K., Zhang L. Deep learning with differential privacy; Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security; Vienna, Austria. 24–28 October 2016; pp. 308–318. [Google Scholar]
- 53.Fan L. A survey of differentially private generative adversarial networks; Proceedings of the 1st AAAI Workshop on Privacy-Preserving Artificial Intelligence; New York, NY, USA. 7 February 2020. [Google Scholar]
- 54.Cai Z., Xiong Z., Xu H., Wang P., Li W., Pan Y. Generative adversarial networks: A survey toward private and secure applications. ACM Comput. Surv. 2021;54:132:1–132:38. doi: 10.1145/3459992. [DOI] [Google Scholar]
- 55.Lu P.-H., Yu C.-M. POSTER: A unified framework of differentially private synthetic data release with generative adversarial network; Proceedings of the 24th ACM SIGSAC Conference on Computer and Communications Security; Dallas, TX, USA. 30 October–3 November 2017; pp. 2547–2549. [Google Scholar]
- 56.Xie L., Lin K., Wang S., Wang F., Zhou J. Differentially private generative adversarial network. arXiv. 20181802.06739 [Google Scholar]
- 57.Arjovsky M., Chintala S., Bottou L. Wasserstein GAN. arXiv. 20171701.07875 [Google Scholar]
- 58.Torkzadehmahani R., Kairouz P., Paten B. DP-CGAN: Differentially private synthetic data and label generation; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; Long Beach, CA, USA. 16–20 June 2019; pp. 98–104. [Google Scholar]
- 59.Mirza M., Osindero S. Conditional generative adversarial nets. arXiv. 20141411.1784 [Google Scholar]
- 60.Liu Y., Peng J., Yu J.J.Q., Wu Y. PPGAN: Privacy-preserving generative adversarial network; Proceedings of the 25th IEEE International Conference on Parallel and Distributed Systems; Tianjin, China. 4–6 December 2019; pp. 985–989. [Google Scholar]
- 61.Ha T., Dang T.K. Investigating local differential privacy and generative adversarial network in collecting data; Proceedings of the 14th International Conference on Advanced Computing and Applications; Quy Nhon, Vietnam. 25–27 November 2020; pp. 140–145. [Google Scholar]
- 62.Chen D., Orekondy T., Fritz M. GS-WGAN: A gradient-sanitized approach for learning differentially private generators; Proceedings of the 34th Annual Conference on Neural Information Processing Systems; virtual. 6–12 December 2020; pp. 12673–12684. [Google Scholar]
- 63.Yang R., Ma X., Bai X., Su X. Differential privacy images protection based on generative adversarial network; Proceedings of the 19th IEEE International Conference on Trust, Security and Privacy in Computing and Communications; Guangzhou, China. 29 December 2020–1 January 2021; pp. 1688–1695. [Google Scholar]
- 64.Gulrajani I., Ahmed F., Arjovsky M., Dumoulin V., Courville A.C. Improved training of Wasserstein GANs; Proceedings of the 31st Annual Conference on Neural Information Processing Systems; Long Beach, CA, USA. 4–9 December 2017; pp. 5767–5777. [Google Scholar]
- 65.Beaulieu-Jones B.K., Wu Z.S., Williams C., Lee R., Bhavnani S.P., Byrd J.B., Greene C.S. Privacy-preserving generative deep neural networks support clinical data sharing. Circ. Cardiovasc. Qual. 2019;12:e005122:1–e005122:10. doi: 10.1161/CIRCOUTCOMES.118.005122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Odena A., Olah C., Shlens J. Conditional image synthesis with auxiliary classifier GANs; Proceedings of the 34th International Conference on Machine Learning; Sydney, NSW, Australia. 6–11 August 2017; pp. 2642–2651. [Google Scholar]
- 67.Fan L., Pokkunuru A. DPNeT: Differentially private network traffic synthesis with generative adversarial networks; Proceedings of the 35th Annual IFIP WG 11.3 Conference on Data and Applications Security and Privacy; Calgary, AB, Canada. 19–20 July 2021; pp. 3–21. [Google Scholar]
- 68.Zhang S., Ni W., Fu N. Differentially private graph publishing with degree distribution preservation. Comput. Secur. 2021;106:102285:1–102285:17. doi: 10.1016/j.cose.2021.102285. [DOI] [Google Scholar]
- 69.Bojchevski A., Shchur O., Zügner D., Günnemann S. NetGAN: Generating graphs via random walks; Proceedings of the 35th International Conference on Machine Learning; Stockholmsmässan, Stockholm, Sweden. 10–15 July 2018; pp. 609–618. [Google Scholar]
- 70.Li A., Fang J., Jiang Q., Zhou B., Jia Y. A graph data privacy-preserving method based on generative adversarial networks; Proceedings of the 21st International Conference on Web Information Systems Engineering; Amsterdam, The Netherlands. 20–24 October 2020; pp. 227–239. [Google Scholar]
- 71.Neunhoeffer M., Wu S., Dwork C. Private post-GAN boosting; Proceedings of the 9th International Conference on Learning Representations; Virtual Event Austria. 3–7 May 2021. [Google Scholar]
- 72.Hardt M., Rothblum G.N. A multiplicative weights mechanism for privacy-preserving data analysis; Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science; Las Vegas, NV, USA. 23–26 October 2010; pp. 61–70. [Google Scholar]
- 73.Indhumathi R., Devi S.S. Healthcare Cramér generative adversarial network (HCGAN) Distrib. Parallel. Dat. 2021;39:1–17. doi: 10.1007/s10619-021-07346-x. [DOI] [Google Scholar]
- 74.Imtiaz S., Arsalan M., Vlassov V., Sadre R. Synthetic and private smart health care data generation using GANs; Proceedings of the 30th International Conference on Computer Communications and Networks; Athens, Greece. 19–22 July 2021; pp. 1–7. [Google Scholar]
- 75.Jordon J., Yoon J., van der Schaar M. PATE-GAN: Generating synthetic data with differential privacy guarantees; Proceedings of the 7th International Conference on Learning Representations; New Orleans, LA, USA. 6–9 May 2019. [Google Scholar]
- 76.Zhang X., Ding J., Errapotu S.M., Huang X., Li P., Pan M. Differentially private functional mechanism for generative adversarial networks; Proceedings of the IEEE Global Communications Conference, GLOBECOM 2019; Waikoloa, HI, USA. 9–13 December 2019; pp. 1–6. [Google Scholar]
- 77.Ho S., Qu Y., Gu B., Gao L., Li J., Xiangy Y. DP-GAN: Differentially private consecutive data publishing using generative adversarial nets. J. Netw. Comput. Appl. 2021;185:103066:1–103066:11. doi: 10.1016/j.jnca.2021.103066. [DOI] [Google Scholar]
- 78.Papernot N., Abadi M., Erlingsson Ú., Goodfellow I.J., Talwar K. Semi-supervised knowledge transfer for deep learning from private training data; Proceedings of the 5th International Conference on Learning Representations; Toulon, France. 24–26 April 2017. [Google Scholar]
- 79.Li C., Xu T., Zhu J., Zhang B. Triple generative adversarial nets; Proceedings of the 31st Annual Conference on Neural Information Processing Systems; Long Beach, CA, USA. 4–9 December 2017; pp. 4088–4098. [Google Scholar]
- 80.McMahan B., Moore E., Ramage D., Hampson S., Arcas B.A. Communication-efficient learning of deep networks from decentralized data; Proceedings of the 20th International Conference on Artificial Intelligence and Statistics; Fort Lauderdale, FL, USA. 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- 81.Augenstein S., McMahan H.B., Ramage D., Ramaswamy S., Kairouz P., Chen M., Mathews R., Arcas B.A. Generative models for effective ml on private, decentralized datasets; Proceedings of the 8th International Conference on Learning Representations; Addis Ababa, Ethiopia. 26–30 April 2020. [Google Scholar]
- 82.Zhang L., Shen B., Barnawi A., Xi S., Kumar N., Wu Y. FedDPGAN: Federated differentially private generative adversarial networks framework for the detection of COVID-19 pneumonia. Inf. Syst. Frontiersz. 2021;23:1403–1415. doi: 10.1007/s10796-021-10144-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Nguyen D.C., Ding M., Pathirana P.N., Seneviratne A., Zomaya A.Y. Federated learning for COVID-19 detection with generative adversarial networks in edge cloud computing. IEEE Internet Things. 2021 doi: 10.1109/JIOT.2021.3120998. [DOI] [Google Scholar]
- 84.Xin B., Geng Y., Hu T., Chen S., Yang W., Wang S., Huang L. Federated synthetic data generation with differential privacy. Neurocomputing. 2022;468:1–10. doi: 10.1016/j.neucom.2021.10.027. [DOI] [Google Scholar]
- 85.Triastcyn A., Faltings B. Federated generative privacy. IEEE Intell. Syst. 2020;35:50–57. doi: 10.1109/MIS.2020.2993966. [DOI] [Google Scholar]
- 86.Yang G., Wang S., Wang H. Federated learning with personalized local differential privacy; Proceedings of the 6th IEEE International Conference on Computer and Communication Systems; Chengdu, China. 23–26 April 2021; pp. 484–489. [Google Scholar]
- 87.Murakami T., Kawamoto Y. Utility-optimized local differential privacy mechanisms for distribution estimation; Proceedings of the 28th USENIX Security Symposium; Santa Clara, CA, USA. 14–16 August 2019; pp. 1877–1894. [Google Scholar]
- 88.Ren X., Yu C.-M., Yu W., Yang S., Yang X., McCann J.A., Yu P.S. LoPub: High-dimensional crowdsourced data publication with local differential privacy. IEEE Trans. Inf. Forensics Secur. 2018;13:2151–2166. doi: 10.1109/TIFS.2018.2812146. [DOI] [Google Scholar]
- 89.Wang N., Xiao X., Yang Y., Zhao J., Hui S.C., Shin H., Shin J., Yu G. Collecting and analyzing multidimensional data with local differential privacy; Proceedings of the 35th IEEE International Conference on Data Engineering; Macao, China. 8–11 April 2019; pp. 638–649. [Google Scholar]
- 90.Ren X., Xu J., Yang X., Yang S. Bayesian network-based high-dimensional crowdsourced data publication with local differential privacy. Sci. Sin. Inform. 2019;49:1586–1605. [Google Scholar]
- 91.Du R., Ye Q., Fu Y., Hu H. Collecting high-dimensional and correlation-constrained data with local differential privacy; Proceedings of the 18th Annual IEEE International Conference on Sensing, Communication, and Networking; Rome, Italy. 6–9 July 2021; pp. 1–9. [Google Scholar]
- 92.Hemkumar D., Ravichandra S., Somayajulu D.V.L.N. Impact of data correlation on privacy budget allocation in continuous publication of location statistics. Peer-to-Peer Netw. Appl. 2021;14:1650–1665. doi: 10.1007/s12083-021-01078-6. [DOI] [Google Scholar]
- 93.Cao Y., Yoshikawa M., Xiao Y., Xiong L. Quantifying differential privacy in continuous data release under temporal correlations. IEEE Trans. Knowl. Data Eng. 2019;31:1281–1295. doi: 10.1109/TKDE.2018.2824328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Jorgensen Z., Yu T., Cormode G. Conservative or liberal? Personalized differential privacy; Proceedings of the 31st IEEE International Conference on Data Engineering; Seoul, Korea. 13–17 April 2015; pp. 1023–1034. [Google Scholar]
- 95.Chen Y., Machanavajjhala A., Hay M., Miklau G. PeGaSus: Data-adaptive differentially private stream processing; Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security; Dallas, TX, USA. 30 October–3 November 2017; pp. 1375–1388. [Google Scholar]
- 96.Niu B., Chen Y., Wang B., Wang Z., Li F., Cao J. AdaPDP: Adaptive personalized differential privacy; Proceedings of the 40th IEEE Conference on Computer Communications; Vancouver, BC, Canada. 10–13 May 2021; pp. 1–10. [Google Scholar]
- 97.Pathak M.A., Rane S., Raj B. Multiparty differential privacy via aggregation of locally trained classifiers; Proceedings of the 24th Annual Conference on Neural Information Processing Systems; Vancouver, BC, Canada. 6–9 December 2010; pp. 1876–1884. [Google Scholar]
- 98.Wei K., Li J., Ding M., Ma C., Yang H.H., Farokhi F., Jin S., Quek T.Q.S., Poor H.V. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 2020;15:3454–3469. doi: 10.1109/TIFS.2020.2988575. [DOI] [Google Scholar]
- 99.Sun L., Qian J., Chen X. LDP-FL: Practical private aggregation in federated learning with local differential privacy; Proceedings of the 30th International Joint Conference on Artificial Intelligence; Montréal, QC, Canada. 19–20 August 2021; pp. 1571–1578. [Google Scholar]
- 100.Wu S., Yu M., Ahmed M.A.M., Qian Y., Tao Y. FL-MAC-RDP: Federated learning over multiple access channels with renyi differential privacy. Int. J. Theor. Phys. 2021;60:2668–2682. doi: 10.1007/s10773-021-04867-0. [DOI] [Google Scholar]
- 101.Niu C., Zheng Z., Wu F., Tang S., Gao X., Chen G. Unlocking the value of privacy: Trading aggregate statistics over private correlated data; Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; London, UK. 19–23 August 2018; pp. 2031–2040. [Google Scholar]
- 102.Gu X., Li M., Cheng Y., Xiong L., Cao Y. PCKV: Locally differentially private correlated key-value data collection with optimized utility; Proceedings of the 29th USENIX Security Symposium; Boston, MA, USA. 12–14 August 2020; pp. 967–984. [Google Scholar]
- 103.Cheng X., Tang P., Su S., Chen R., Wu Z., Zhu B. Multi-party high-dimensional data publishing under differential privacy. IEEE Trans. Knowl. Data Eng. 2020;32:1557–1571. doi: 10.1109/TKDE.2019.2906610. [DOI] [Google Scholar]
- 104.Zheng X., Cai Z. Privacy-preserved data sharing towards multiple parties in industrial IoTs. IEEE J. Sel. Areas Commun. 2020;38:968–979. doi: 10.1109/JSAC.2020.2980802. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Not applicable.