Abstract
Conducting secure computations to protect against malicious adversaries is an emerging field of research. Current models designed for malicious security typically necessitate the involvement of two or more servers in an honest-majority setting. Among privacy-preserving data mining techniques, significant attention has been focused on the classification problem. Logistic regression emerges as a well-established classification model, renowned for its impressive performance. We introduce a novel matrix encryption method to build a maliciously secure logistic model. Our scheme involves only a single semi-honest server and is resilient to malicious data providers that may deviate arbitrarily from the scheme. The -transformation ensures that our scheme achieves indistinguishability (i.e., no adversary can determine, in polynomial time, which of the plaintexts corresponds to a given ciphertext in a chosen-plaintext attack). Malicious activities of data providers can be detected in the verification stage. A lossy compression method is implemented to minimize communication costs while preserving negligible degradation in accuracy. Experiments illustrate that our scheme is highly efficient to analyze large-scale datasets and achieves accuracy similar to non-private models. The proposed scheme outperforms other maliciously secure frameworks in terms of computation and communication costs.
Index Terms—: Privacy-preserving, logistic model, malicious adversary, indistinguishability
I. Introduction
Internet of Things (IoT) approaches our lives gradually with the wireless communication systems increasingly employed as technology driver for smart monitoring and applications. An IoT system can be depicted as smart devices that interact on a collaborative basis for a common goal. Smart cities are incorporating a wide range of advanced IoT infrastructures, resulting in a large amount data gathered from different devices deployed in many domains, such as health care, energy transmission, and transportation [1]. Smart things provide efficient tools for ubiquitous data collection or tracking, but also faces privacy threats.
In order to solve the challenges arising from IoT data processing and analysis, an increasing amount of innovations have been emerged recently. For instance, collaborative learning is a desirable and empowering paradigm for smart IoT systems. Collaborative learning enables multiple data providers to learn models utilizing all their data jointly [2], [3]. Typical collaborative systems are distributed computing systems such as secure multi-party computation (SMC) frameworks [2], [4]. SMC enables parties to jointly compute on private inputs without revealing anything but the result.
Collaborative learning has benefited the society including medical research [5]. Data containing healthcare informatics are usually collected in medical centers such as hospitals. Generally, the study center does not share data with other institutes considering the confidentiality of participants. To learn disease mechanisms especially rare diseases for which each center has limited cases, it is of importance to perform data analysis combining data from multiple institutes. Collaborative learning provides great promise to connect healthcare data sources. Since data sharing of individual levels is not permitted by law or regulation in many domains, various privacy-preserving techniques have been developed to perform collaborative learning.
Many privacy-preserving techniques assume semi-honest models, in which the server and clients follow the protocol specification. Because clients could be any arbitrary entity, it is less likely that all the clients (i.e., data providers) would be semi-honest. Recently, maliciously secure models [6], [7] have been proposed to achieve privacy in the presence of malicious adversaries that could deviate arbitrarily from the protocol specification. Based on the assumption of the number of servers that can be malicious in the protocol, maliciously secure frameworks operate in either an honest-majority setting [6]–[14] or a malicious-majority setting [15]–[18]. These frameworks typically rely on multiple servers (e.g., threeserver model [6], [7], [9], [10], four-server model [8], [12]–[14]), with the most common assumption being an honest-majority setting (i.e., a majority of servers are semi-honest). In contrast, the malicious-majority setting anticipates a scenario where a majority of servers may behave maliciously. This setting enhances security in environments where a significant portion of servers may be untrustworthy, providing a more realistic and robust solution in adversarial conditions. Since the efficiency of SMC protocols is highly dependent on the number of honest servers [13], maliciously secure frameworks with the malicious-majority setting is less efficient than those with the honest-majority setting.
In this paper, we propose a privacy-preserving logistic model scheme, assuming a dishonest majority in a maliciously secure setting. We assume that data are horizontally distributed among data providers (i.e., each data provider is a client that collects information of the same features for different samples). Our contributions are summarized as follows:
We propose a novel matrix encryption technique to build a maliciously secure logistic model. Unlike state-of-the-art frameworks that necessitate the involvement of two or more servers in an honest-majority setting, our scheme involves only a single semi-honest server and is resilient to malicious attacks conducted by data providers. Malicious behaviors conducted by data providers are detectable during the verification stage.
The proposed matrix encryption method combines Gaussian matrix encryption with -transformation and commutative matrix encryption. The implementation of -transformation ensures that random records within any energy range are indistinguishable. The commutative matrix encryption is applied to preserve data utility.
We utilize a lossy compression method to reduce communication costs while ensuring negligible degradation in accuracy. Compared with other maliciously secure frameworks, our scheme is more efficient to analyze large-scale datasets in terms of computation and communication costs.
II. Related work
Secure multi-party computation (SMC)
SMC frameworks with a small number of parties have shown to be particularly attractive recently. Among these frameworks, homomorphic encryption (HE) [22], [23] has been widely used to protect data privacy [10], [11], [18]-[20]. Recent advances on garbled circuit [24], [25] have led to a set of privacy-preserving protocols [15]-[17] for SMC tolerating an arbitrary number of malicious corruptions. Garbled circuits and HE techniques require large volumes of ciphertexts to be transferred or have high computation complexity. In terms of efficient constructions, various secure frameworks in an honest-majority setting [6]–[14] have drawn phenomenal attention. The details of these frameworks are summarized in Table I.
Table I:
Recent secure multi-party computation (SMC) frameworks
Framework | No. of parties/servers | Encryption method | Threat model | Collusion assumption |
---|---|---|---|---|
[15] | 2 | Garbled circuit | Malicious | Malicious-majority |
[16] | ≥2 | Garbled circuit | Malicious | Malicious-majority |
[17] | ≥2 | Garbled circuit | Malicious | Malicious-majority |
[8] | 4 | Garbled circuit | Malicious | Honest-majority |
[9] | 3 | Garbled circuit | Malicious | Honest-majority |
[18] | – | Homomorphic encryption | Malicious | Malicious-majority |
[19] | 1 | Homomorphic encryption | Semi-honest | Passive adversary |
[10] | 3 | Mixed | Malicious | Honest-majority |
[11] | 2 | Mixed | Malicious | Honest-majority |
[20] | 2 | Mixed | Semi-honest | Passive adversary |
[12] | 3, 4 | Joint message passing | Malicious | Honest-majority |
[13] | 3, 4 | SPDZ | Malicious | Honest-majority |
[21] | 2 | Secret sharing | Semi-honest | Passive adversary |
[14] | 4 | Secret sharing | Malicious | Honest-majority |
[6] | 3 | Secret sharing | Malicious | Honest-majority |
[7] | 3 | Secret sharing | Malicious | Honest-majority |
Our | 1 | Matrix encryption | Malicious | Malicious-majority |
Malicious model: the entity deviates arbitrarily from the protocol specification. Semi-honest model: the entity follows the prescribed protocol but attempts to gain unauthorized information by covertly observing the communication or computations of other entities involved. Mixed encryption method: the framework applies both garbled circuit and homomorphic encryption.
Differential privacy
Differential privacy (DP) [26] has been widely incorporated into distributed deep learning [27]–[29] by adding noise to input data, loss functions, gradients, weights, or output classes. Moreover, DP has been applied to enable secure exchanges of intermediate data and obtain models resilient to adversarial inferences in the federated learning [30]–[32]. There are still some challenging issues to implement DP in practice since it requires high privacy budgets to train robust and accurate models and the level of privacy achieved in practice remains unclear [33].
Matrix encryption
Matrix encryption has been extensively utilized in the development of compressed sensing (CS)-based cryptosystems [34]–[37]. This approach is well-suited for ensuring the security of practical applications, such as the Internet of Things and multimedia. The Gaussian one-time sensing CS-based cryptosystem, which employs a random Gaussian matrix and renews the matrix at each encryption, has been proven to be asymptotically secure for the plaintext with constant energy [34]–[36]. It is challenging to practically implement these CS-based cryptosystems because the indistinguishability of Gaussian matrix encryption is highly sensitive to variations in the energy of plaintexts.
To summarize, a majority of maliciously secure models necessitate the involvement of two or more servers in an honest-majority setting. Moreover, existing secure models have relatively low efficiency for large-scale data analysis. Previous studies of Gaussian matrix encryption ensures indistinguishability among records with constant energy (i.e., Euclidean norm) and poses a practical challenge when implementing it with data having arbitrary energy ranges. This paper introduces a maliciously secure logistic model which ensures indistinguishability among random records within any energy range. Our model assumes the malicious-majority setting and is highly efficient in analyzing datasets of substantial size.
III. Preliminaries
A. Logistic model
Consider a set of data , where and denotes the binary outcome such as case/control status of . Without loss of generality, a constant 1 is typically added as the first element to the record to account for the intercept. The logistic model [38], [39] has the form
(1) |
where is a -dimensional coefficient vector and Pr(⋅) is the probability function. Model estimate of the logistic model is typically fitted through maximum likelihood, using the conditional likelihood. The log-likelihood is
(2) |
where .
For ridge regularized logistic model, we maximize the log-likelihood subject to a size constraint on -norm (i.e., Euclidean norm) of the coefficients. The ridge estimate is
(3) |
where is the ridge parameter.
Let denote the outcome, denote the feature matrix, and let be the diagonal matrix of weights (Equation 4).
(4) |
We use Newton’s method to fit the logistic model. Given , a single Newton update is
(5) |
where the derivatives are evaluated at . The equation can be expressed using matrix notations as follows.
(6) |
where and is the -th diagonal element in the diagonal matrix is a matrix of zeros for non-regularized logistic model. For ridge regularized logistic model, is a diagonal matrix with the diagonal elements being .
B. Indistinguishability
Indistinguishability has been widely used as the security measure in recent cryptosystems. Using different notations (e.g., Definitions 3.9 and 3.10 in [40], Definition 2.1 in [41], Definition 2 in [42], Definition “PrvInd” in [43], Definition 2 in [44], Definition 1 in [45], Definition 1 in [35], Section III in [36]), all these indistinguishability definitions express the same security level: a cryptosystem has the indistinguishability if no adversary can determine in polynomial time which of the two plaintexts corresponds to the ciphertext, with probability significantly better than that of a random guess. In other words, within a cryptosystem with indistinguishability, an adversary cannot learn any partial information of the plaintext in polynomial time given a ciphertext. Comprehensive comparisons of indistinguishability with other security measures (e.g., differential privacy) are given in [41], [42]. In line with other cryptosystems utilizing matrix encryption methods [35], [36], [44], we provide the formal definition of indistinguishability, denoted as Definition 1, following Definition 1 in [35], Section III in [36], and Definition 2 in [44].
Definition 1. Let be the probability that an adversary can successfully discern which of the two plaintexts corresponds to the ciphertext using any algorithm that operates within polynomial time. Then a cryptosystem is indistinguishable if there is a negligible function such that for all plaintext length ,
(7) |
is negligible if there exists an integer for every positive constant such that for all .
Let be the total variation (TV) distance [46] between and where is the probability distribution of conditioned on . Based on [47], the probability to successfully distinguish the plaintexts is bounded by
(8) |
where . If , the probability of success is at most equivalent to a random guess, leading to indistinguishability [40].
Computing directly is difficult [48] and we employ the Hellinger distance [46] to bound the TV distance. Let be the Hellinger distance [46] and it can give both lower and upper bounds on the TV distance [49], i.e.,
(9) |
where . Moreover, if and are multivariate Gaussian distributions (i.e., the ciphertext conditioned on follows Gaussian distribution with zero mean and covariance matrix ), the Hellinger distance between and is given by [50] and [51]
(10) |
where is defined as the average of and (i.e., ). Formal definitions and properties of total variation and Hellinger distances are given in [46]–[48].
C. Adversarial attacks on matrix encryption methods
In previous privacy-preserving frameworks using matrix encryption techniques [37], [52]–[54], adversarial attack models are classified into four levels: ciphertext-only attack (COA), known-plaintext attack (KPA), chosen-plaintext attack (CPA), and chosen-ciphertext attack (CCA).
Ciphertext-only attack (level 1):
The adversary is assumed to have access to ciphertexts and no other information. Within the COA, the adversary attempts to retrieve sensitive information using the ciphertexts.
Known-plaintext attack (level 2):
The adversary has access to the ciphertexts and corresponding plaintexts. Within the KPA, the attacker attempts to recover sensitive information by analyzing ciphertexts and their corresponding plaintexts.
Chosen-plaintext attack (level 3):
Given any plaintext, the adversary can get its corresponding ciphertext within the CPA. The adversary attempts to recover the encryption key or algorithm by examining associations between plaintexts and ciphertexts.
Chosen-ciphertext attack (level 4):
Within the CCA, the adversary has the capability to obtain the decryption of any ciphertexts of its choice. The adversary attempts to determine the plaintext that was encrypted to give some other ciphertexts.
In our scheme, no ciphertext is decrypted and thus it is impossible for adversaries to obtain the decryption of any ciphertext. Therefore, adversaries could not perform CCA and we only consider the first three attacks.
D. Matrix encryption
Assume is a random row in the dataset containing rows and columns. To encrypt , a random Gaussian matrix, denoted as with dimensions , is generated, where each element follows a Gaussian distribution , and and are parameters of the distribution. The encryption function for row and data can be summarized as and . Each column in the dataset can be encrypted similarly. Specifically, let be a random column in . A random Gaussian matrix is generated for encryption. The encryption function for column and data can be summarized as and .
IV. System Overview
A. System model
We investigate collaborative learning in which data are collected and owned by different data providers, referred to as clients. The goal is to build an efficient logistic model using data from all the clients while ensuring privacy protection. We consider that data are horizontally distributed, i.e., clients have different sets of samples and the same set of features (Figure 1A.
Fig. 1:
A: an example showing the horizontal partitioning scenario with three data providers (referred to as clients); B: workflow of the proposed privacy-preserving logistic model.
The proposed scheme involves multiple clients and one server who is responsible for the secure computation. The privacy-preserving scheme contains four stages: encryption, modeling, decryption and verification (Figure 1B). Clients perform data encryption as the initial step. After the encryption process, the data are sent to the server for the secure computation. The server then sends encrypted model results back to clients. Subsequently, clients decrypt the model results. Finally, the server and clients initiate the verification stage to identify any malicious activity conducted by the clients.
B. Threat model
Maliciously secure frameworks with multiple servers have been particularly rich [6], [7], [12], [13]. These frameworks assume at least one server is semi-honest and does not collude with malicious adversaries. Following these frameworks, the sole server in our scheme is assumed to be semi-honest, while clients are allowed to be malicious. Precisely, we assume that the server faithfully executes the delegated computations but may be curious about the intermediate data and try to learn or infer any sensitive information. In contrast, clients are likely to act maliciously (i.e., arbitrarily deviate from the predefined scheme to cheat others). Clients may collude with each other while the server is not allowed to collude with malicious clients. The possible adversary behaviors of malicious clients and the semi-honest server are summarized as follows.
To perform the ciphertext-only attack (CPA), clients insert fake plaintexts into the privacy-preserving scheme and collude with each other to share both plaintexts and their corresponding ciphertexts. Detailed description of the CPA is given in Section VI.
Malicious clients do not follow the proposed encryption method to encrypt data (i.e., each client’s data should be encrypted sequentially by all the clients using commutative matrices). After getting sufficient data for CPA, malicious clients may choose to skip subsequent computations in order to reduce the computation cost.
Malicious clients do not follow the decryption procedure (i.e., the encrypted model result derived by the server should be decrypted sequentially by all the clients using commutative matrices which have been used for data encryption). Once malicious clients have gathered enough information for CPA, they may choose to skip the computations and send fake data to other clients in the decryption procedure.
The semi-honest server attempts to retrieve sensitive information from received ciphertexts.
C. Design goals
Our design goals contain four aspects. Privacy: The private data have to remain confidentiality at any time. Learning verifiability: There should be a verification stage to check whether all the clients behave honestly. Correctness: The scheme is able to derive correct model result if all the clients and the server behave honestly. Efficiency: The scheme is computationally efficient and achieves high accuracy.
V. Proposed scheme
A. Data encryption, modeling and decryption
Suppose there are clients and client owns the data (i.e., plaintext) with samples and features . These clients collect the same features for different samples. The analytical model will be built using the aggregated data, i.e., . To ensure data confidentiality, we introduce a novel privacy-preserving logistic model that employs random matrices for encryption. Specifically, client encrypts using random encryption matrices and . Aggregating these encrypted datasets, we get . Since and are generated randomly by client , the aggregated dataset does not preserve data utility. In order to maintain the utility of the data, we require that 1) is designed to be commutative (i.e., for , Appendix A), 2) is subsequently encrypted by client using , and 3) the encryption matrix is decrypted by the server prior to the secure computation. The commutative nature of the random encryption matrix guarantees that the resulting encrypted dataset remains independent of the order in which clients perform encryption. Clients send the encrypted data (i.e., ciphertexts) to the server after the encryption. The server then decrypts and obtains the aggregated data where , as are designed to be commutative (i.e., for ). Table II summarizes symbols in the proposed scheme.
Table II:
Notations
Notation | Description |
---|---|
Number of clients (data providers) | |
Data (plaintext) collected by client | |
Transformed outcome | |
Number of samples in and | |
Number of features in | |
Aggregated data (plaintext) | |
Number of samples in and | |
The -th record (row) in | |
The -th column in | |
A diagonal matrix (Equation 4) | |
Random Gaussian matrix generated by client | |
Inverse of matrix | |
Random Gaussian matrix shared among clients | |
Random coefficient generated by client | |
Commutative matrix generated by client | |
Encryption matrix | |
Inverse of matrix | |
Encrypted (ciphertext) | |
Encrypted outcome (ciphertext) | |
Transpose of | |
A constant to ensure indistinguishability (Algorithm 1) | |
Model estimate (a -dimensional vector) of non-secure model | |
Model estimate (a -dimensional vector) derived by the server | |
Euclidean norm of | |
Hellinger distance | |
Total variation (TV) distance | |
Lower bound of the TV distance | |
Upper bound of the TV distance | |
Success probability in the indistinguishability experiment | |
A negligible function | |
Pseudo outcome | |
Encrypted pseudo outcome | |
Encryption function | |
Encryption function | |
Encryption function |
.
Pre-processing
Before data encryption, a pre-processing procedure is conducted by each client. Specifically, client generates a pseudo record with all the values being 1 (i.e., ) and adds it to as the first row. The added row is used for malicious behavior detection. To encrypt the outcome information (i.e., collected by client ), client computes and concatenates it into as the last row. Additionally, client calculates the Euclidean norm of each column in . Let be the Euclidean norm of the -th column and . Client generates a vector, , with the -th element of being is added to as the last row. This procedure guarantees that the Euclidean norm of each column equals and is essential to ensure the indistinguishability of our encryption approach.
Without loss of generality, we include an intercept to the logistic model. Specifically, a vector of ones is added to as the first column. To achieve indistinguishability, each client multiplies the elements in the first column by a constant , where is selected by Algorithm 1 (i.e., -transformation). The -transformation is performed before data encryption.
The proposed encryption procedures can be categorized into two layers: internal and external. The data are first encrypted by its owner internally and then subsequently encrypted by other clients, referred to as the external encryption.
Internal encryption
Client first generates a random Gaussian matrix to encrypt . To improve the computation efficiency of the encryption for clients with large sample size, we partition the encryption matrix into a diagonal matrix. The detailed description is given in Appendix B. Client shares with the server and encrypts the data as . Client further encrypts using a specifically designed matrix . To generate the specific , a random Gaussian matrix is generated and shared among the clients. Client subsequently generates a random coefficient vector and a client-specific matrix . This ensures that (generated by client ) and (generated by client are commutative, i.e., (Appendix A). Client computes and sends to other clients for external encryption.
External encryption
Upon receiving from client , client further encrypts it using (i.e., ) and sends the updated ciphertext to client . Client then encrypts the ciphertext using and sends the ciphertext to client . After all the clients complete the external encryption, the ciphertext is in the form of where is then sent to the server.
To build the ridge regression model, client computes and sends to client . Client calculates and sends it to client . Each of the clients conducts the encryption sequentially. Once all clients complete encryption, is sent to the server for ridge model computation.
Algorithm 2 describes the detailed encryption procedures and Figure 2 gives an example of the encryption procedures with three clients. Table III summarizes the internal and external encryption procedures. The primary goal of the internal encryption layer is to protect against malicious adversaries. To preserve data utility, the data are further encrypted by the other clients using commutative matrices in the external encryption layer.
Fig. 2:
An example showing the proposed encryption procedures including data transformation details with three clients. denotes the data collected by client is integrated into prior to the data encryption. Data are encrypted by all the clients sequentially. The arrows connect the origin and endpoint of the data transmission.
Table III:
Encryption details for the plaintext (owned by client )
Encryption layer | Client | Rationale | Affected by malicious adversaries? | Encryption matrix |
---|---|---|---|---|
Internal | Withstand malicious adversaries | No | ||
External | Preserve data utility | Yes |
Modeling
Upon receiving (and for ridge regression), the server decrypts to get . For the subsequent analysis, the server defines as (i.e., ) and further eliminates pseudo records in . The detailed procedures are described in Algorithm 3. The server retrieves the encrypted outcome from and denotes it as . According to the encryption procedure, . Then the server derives a -dimensional vector . Given the encrypted data, the Newton update becomes
(11) |
where ,
(12) |
for is a diagonal matrix with the -th diagonal element being , Λ is a matrix of zeros for the non-regularized logistic model while Λ is a diagonal matrix with the diagonal elements being for ridge regression. The server computes model estimates using Equation 11 until converged (e.g., the minimum of the absolute difference between ( and is smaller than 10−6).
Theorem 1. The privacy-preserving logistic model converges as long as the non-secure logistic model converges.
Proof. Let denote the initial point in Newton’s method within the privacy-preserving logistic model. According to Theorem 9 (Appendix C), it is equivalent to setting as the initial point within the non-secure model. Given the initial point , suppose the non-secure model converges after iterations with the model estimate being . Based on Theorem 9 (Appendix C), our privacy-preserving model also converges after iterations with the model estimate being . □
Decryption
The server sends the converged model estimate to the clients. As shown in Theorem 9 (Appendix C), we have where is the estimate of the non-secure model. To get the true model estimate , client ) uses the encryption matrix to decrypt . Figure 3 shows the detailed decryption procedure. is the result once all clients complete decryption.
Fig. 3:
The decryption procedure. Client decrypts sequentially . The data above the arrow are those transferred among clients.
B. Multiclass classification
Our scheme can be modified to solve the multiclass classification problem. Suppose the outcome contains a total of classes. Client defines sub-outcomes , using the indicator function as follows.
(13) |
Following the above described scheme, client calculates . Before data encryption, client adds to the feature matrix as the last rows. Define the sub-outcome for the -th classes as for . After data encryption, the server performs the secure logistic model computation (Algorithm 3) for each of the sub-outcomes using each pair of feature matrix and outcome for .
C. Lossy compression with SZ
To reduce the communication cost, we employ the lossy compression technique with SZ [55] for data compression. Lossy compression with is an error-bounded lossy compression scheme [55]–[57]. In our scheme, the data are compressed before being transferred between clients and the server.
D. Verification: malicious behavior detection
To identify if any client has conducted malicious behavior, a designated pseudo outcome is subjected to the same procedures as the original outcome . Assuming all clients adhere to our scheme for both and , predetermined outputs are anticipated after the verification stage.
Specifically, client defines a constant where and shares with the server. For the verification, client generates a pseudo outcome is the sum of , denoted as . can be expressed as . Client further encrypts using (i.e., ) and sends the ciphertext to the other clients. is subsequently encrypted by client using . After all the clients have completed the encryption process, is shared with the server. Upon receiving , the server verifies if where . If any client exhibits malicious behavior during the encryption process, the equation will not hold (Theorem 2). Since , we design the following process to verify if the clients follow the proposed decryption procedure. The server first calculates the sum of encrypted pseudo outcomes (i.e., ) and the estimate can be simplified as follows.
(14) |
The server shares with the client who performs the verification (e.g., client 1). To confirm that no malicious behavior was conducted within the decryption process, client 1 combines with . Specifically, client 1 generates two random constants, and , and defines a new estimate as . is decrypted by all the clients following the procedure in Figure 3. Let be the decrypted estimate. Upon obtaining , client 1 calculates where is the decrypted model estimate. is expected to be a vector of ones if all the clients correctly decrypt both and following the proposed decryption procedure (Theorem 3). Algorithm 4 summarizes the verification process.
In order to preserve data utility, all the clients need to follow three encryption criteria. We utilize a case study involving two clients to better illustrate these criteria. The encryption procedures are shown in Figure 4. Initially, it is necessary for clients to employ uniform encryption matrices to encrypt datasets owned by other clients. This condition implies that and in the example. Additionally, these encryption matrices must be commutative with each other, which implies that the multiplication of and equals the multiplication of and , denoted as . Thirdly, clients use uniform inputs across the entire encryption process. More precisely, each client does not alter the data owned by other clients during data encryption. To violate the third criterion, client 2 may selectively encrypt specific rows within the dataset or substitute with fake data prior to transmitting it to the server.
Fig. 4:
An example of the proposed encryption procedures with two clients. denotes the data collected by client is integrated into prior to the data encryption. The internal encryption is performed by each client, encrypting its own data internally before transmitting it to other clients. The external encryption is further conducted by the other clients. Finally, the server decrypts from the data it has received and retrieves the encrypted outcome information.
Let and denote the ciphertexts transmitted to the server from client 2 and client 1, respectively. The server then decrypts and extracts the encrypted outcome . If all the clients adhere to these three criteria, the server should get the ciphertexts in Equations 15 and 16 (C-1, C-2, and C-3 refer to criteria 1 through 3).
(15) |
(16) |
To preserve data utility, 1) and need to be encrypted by the same encryption matrix , and 2) and need to be encrypted by the same encryption matrix . The malicious behavior of any client during data encryption results in a breach of one or both of these two requirements, thereby affecting the utility of the data.
Theorem 2. The proposed verification algorithm can identify the malicious behavior during the encryption process.
Proof. Firstly, the server verifies if and are encrypted by the same encryption matrix by checking whether the first rows in and are identical. To meet this requirement, all clients must adhere to the three encryption criteria indicated in Equations 15 and 16. Secondly, the server checks whether equals to ascertain if and have been encrypted using the same encryption matrix. Upon receiving and , the server verifies if by calculating
(17) |
for . The server affirms the absence of any malicious activities by validating the fulfillment of the two conditions mentioned above. □
In the decryption procedures, client should use the encryption matrix to decrypt (output in Algorithm 3) (decryption criterion).
Theorem 3. The proposed verification algorithm can identify if the client violates the decryption criterion.
Proof. In our scheme, both and need to be decrypted following the decryption procedures. Let and be the decrypted data of and , respectively. Since
(18) |
we have
(19) |
Suppose client follows the proposed decryption procedures to decrypt both and and should be identical (i.e., where ). So
(20) |
According to Equation 14,
(21) |
Based on Equations 20 and 21, is expected to be a vector of ones if no malicious activities are involved in the decryption procedures. Therefore our verification stage can identify whether the client breaks the decryption criterion. □
VI. Security analysis
The encryption matrices and may be recovered if ciphertexts of different plaintexts are distinguishable. Once the encryption matrix is recovered, the client can recover other clients’ data. Potential attacks to achieve such goals include CPA, KPA, COA and CCA [34]–[36], [40] as described in Section III-C. It is impossible to perform CCA for our scheme because adversaries cannot obtain the decryption of any ciphertext. Since CPA is more threatening than KPA and COA, a secure scheme is resilient to KPA and COA if it protects against CPA.
CPA [37], [52]–[54] is reasonable in our scheme. Consider a robust threat model in which all clients, except one, can be compromised in a collusion attack (Figure 5). Suppose client 1 is the only honest client. In the external encryption layer, client generates fake data and sends it to client 1 for encryption. Client 1 uses to encrypt and returns the ciphertext to the other clients for further encryption. During the process, colluded clients share received ciphertexts from agency 1 and are able to match each plaintext with its ciphertext . In a collusion attack, colluded clients cooperate as a group to share plaintexts and ciphertexts with each other. So the colluded group is able to insert arbitrary plaintexts and get the corresponding ciphertexts for the purpose of CPA. The colluded group will try to first recover the encryption matrix and then retrieve the plaintext owned by client 1.
Fig. 5:
An example of the strong threat model with all the clients except one can be compromised. Suppose client 1 is the only honest client and denotes the commutative matrix for data encryption. denotes the plaintext from client .
To be resilient to CPA, the encrypted data in our privacy-preserving model should have indistinguishability for any random plaintexts. In this section, we demonstrate that the ciphertexts of two arbitrary plaintexts are indistinguishable in our privacy-preserving scheme.
Define the encryption functions and where is a random Gaussian matrix, i.e, each element of follows Gaussian distribution. Since , the internal encryption function can be split into 3 sub-functions. Specifically, where , and .
The clients have access to the ciphertext . In contrast, the server receives the encryption matrices from client and derives the ciphertext is only used in the internal encryption layer and the function is employed to ensure that clients cannot conduct effective CPA. We first prove that records in are indistinguishable to the clients (Section VI-A) and then demonstrate that records in are indistinguishable to the server (Section VI-B).
A. Indistinguishability of : security against clients
Theorem 4. Given the Gaussian matrix encryption function , where is a random Gaussian matrix, the worst-case lower and upper bounds on are
(22) |
(23) |
where and are two arbitrary columns in .
Proof. Based on the proof of [[35], Lemma 1], the covariance matrix of conditioned on the plaintext is where denotes the Euclidean norm of and is the identity matrix. Therefore, . Because and are diagonal matrices, we can get their determinants as
(24) |
and
(25) |
So
(26) |
According to the inequality relation between the Hellinger distance and the TV distance (Equation 9), we can get the lower and upper bounds of the TV distance as follows.
(27) |
□
Theorem 5. Given the Gaussian matrix encryption function , where is a random Gaussian matrix, the worst-case lower and upper bounds on are
(28) |
(29) |
where and are two arbitrary rows in .
Proof. The proof is identical to that provided in Theorem 4. □
Corollary 1. The success probability of an adversary in the indistinguishability experiment is bounded by
(30) |
If each plaintext has constant Euclidean norm (i.e., for two random records and ), the cryptosystem has indistinguishability since .
Corollary 1 ensures that no adversary can learn any partial information about the plaintext from a given ciphertext, as long as each plaintext has constant Euclidean norm. Because the Euclidean norms of all the columns in the plaintext are designed to be the same (Section V-A), any two arbitrary columns in are indistinguishable.
Theorem 6. The TV distance does not increase by encryption functions and . In other words, where and denote the probability distribution of a random row in and , respectively .
Proof. Hellinger distance can be expressed as a function of Rényi divergence [58], i.e.,
(31) |
where denotes the Rényi divergence of from . Based on the data processing inequality [[58], Theorem 1], . So . Because and function is monotonically increasing on ,
(32) |
In other words, . □
Corollary 2. Clients are unable to learn any partial information of in polynomial time within our privacy-preserving scheme, thereby ensuring that our scheme is resilient to the CPA conducted by the colluded clients.
Proof. According to Theorems 4, 6 and Corollary 1, the internal encryption function is indistinguishable. Theorem 6 indicates that the TV distance does not increase by the external encryption layer and thus is indistinguishable where Therefore, clients cannot perform effective CPA to learn the sensitive information of the other clients. □
B. Indistinguishability of : security against the server
We further illustrate that any two arbitrary rows in are indistinguishable. In the initial model training process (Algorithm 3), the server decrypts from and thus has access to . With the indistinguishability of the encryption function , the server cannot learn any partial information about in polynomial time (Corollary 3).
Theorem 7. Given and are two arbitrary rows in ) and a negligible function and are indistinguishable if satisfies where is the number of features.
Proof. The encryption function can be split into 2 sub-functions, i.e., and . According to Theorem 5 and the data processing inequality [[58], Theorem 1], the upper bound of the TV distance between and is . Indistinguishability requires that . To achieve this, we require that . So for a given . This leads to . □
We propose a data transformation method, i.e., “ -transformation”, to ensure the indistinguishability for arbitrary records, irrespective of whether they possess a consistent Euclidean norm or not. Assume is monotonically increasing on . The decrease in leads to the increase in . To maintain within an acceptable range, the Euclidean norms of any two arbitrary records need to be close to each other (i.e., ). To achieve this, we perform the -transformation for each record. More precisely, we use (i.e., a vector of constant ) instead of a vector of ones as the intercept in . The first element of and becomes instead of 1. As presented in Figure 6A, a large ensures that is close to 0 for a fixed . For fixed increases as rises (Figure 6B).
Fig. 6:
Fig. A: given and different values of . Fig. B: given and different values of and .
Theorem 8. The -transformation ensures indistinguishability for any and .
Proof. As described in Theorem 7, our scheme achieves indistinguishability depending on . Since is monotonically decreasing on , there exists a minimum threshold such that for any . Let where and are two random records in the plaintext. Define where and are the -transformed and , respectively. So we have . For increases as goes up. So there exists a constant such that , implying that given any and . To conclude, the -transformation guarantees indistinguishability for any and . □
Considering that (e.g., ) can be large given small , we set an upper bound for (e.g., ). As shown in Figure 6B, increases when goes up. Given is sufficient to ensure indistinguishability when (Figure 6B). For a large , clients follow Algorithm 1 to select a constant such that the encryption function in our scheme has indistinguishability.
Corollary 3. With the -transformation (Algorithm 1), the encryption function ensures the indistinguishability among any arbitrary records in . This demonstrates that the server cannot learn any partial information about in polynomial time.
Proof. Given any negligible function and data , Algorithm 1 selects a constant such that where ( and are two arbitrary records in ). With the -transformation, any two arbitrary records in are indistinguishable (Theorem 8). □
The -transformation (i.e., multiplication of the intercept with a constant ) does not alter logistic model estimates, except for the intercept estimate. After multiplying the intercept by , the dataset becomes , where represents a diagonal matrix with its diagonal elements being . Based on Equation 6, the model estimate for becomes where refers to a diagonal matrix with the diagonal elements being . So only the intercept estimate is altered by the multiplication of constant , while the estimates for the features remain unchanged.
VII. Performance evaluation
We perform experiments using the MNIST dataset from the UCI repository [59]. The MNIST dataset consists of hand-written digit images, each comprising a 28 by 28 pixel grid. Each image is associated with an integer label ranging from 0 to 9. The dataset consists of 60,000 images for training and 10,000 images for testing. In our privacy-preserving learning, we assume samples in each dataset are evenly distributed among clients, with each subset encompassing all the features. All the experiments are performed in Matlab on the University of Florida HiPerGator 3.0 (i.e., high computing performance) with 1 CPU and 40 Gb RAMs.
Our privacy-preserving logistic model is applied to differentiate label 9 from the remaining labels (0 to 8). To evaluate model performance on the large-scale dataset, we apply the bootstrap method [60] to create three datasets with sample sizes of , and , respectively. To minimize communication cost, we employ SZ for lossy compression, ensuring a relative error threshold of less than 0.01 (i.e., the difference between raw data and compressed data (maximum value-minimum value)). As shown in Table IV, our scheme with SZ compression after 20 iterations achieves an accuracy level equivalent to that of the non-secure model. The non-secure model is constructed using aggregated data from all clients without incorporating privacy protection considerations. The utilization of SZ compression leads to a notable decrease in communication costs. In scenarios where the number of clients involved in the privacy-preserving learning framework is less than 10, our scheme incorporating SZ compression incurs lower communication costs compared to the non-secure model (Figure 7A). Figure 7B shows that our scheme has high computation efficiency in analyzing large-scale datasets.
Table IV:
Model accuracy of the proposed privacy-preserving logistic model (relative error bound < 0.01 in the SZ compression)
Dataset | Non-secure model | Our scheme w/o SZ | Our scheme with SZ | |||
---|---|---|---|---|---|---|
97.27% | 97.27% | 96.73% | 97.22% | 97.27% | 97.27% | |
97.0% | 97.0% | 96.46% | 96.97% | 97.0% | 97.0% | |
97.25% | 97.25% | 96.69% | 97.18% | 97.24% | 97.25% | |
97.25% | 97.25% | 96.70% | 97.20% | 97.23% | 97.25% |
Non-secure model: the model built on the aggregated data from all the clients without considering privacy protection. w/o SZ: without SZ compression. Iter: iteration times. : No. of samples in the aggregated data.
Fig. 7:
A: communication cost of the proposed scheme. w/o SZ: without SZ compression. : No. of samples in the aggregated data. : No. of the clients. B: computation time of 20 iterations in the proposed scheme. w/o SZ: without SZ compression. : No. of samples in the aggregated data. : No. of the clients.
We further compare the performance of our scheme with four state-of-the-art frameworks that provide malicious security. First, we evaluate the performance of our scheme for the binary classification problem and compare with two maliciously secure frameworks, SWIFT [12] and Fantastic4 [13]. Specifically, we construct the secure logistic model to distinguish between the digits 4 and 9 in the MNIST dataset (binary classification). A total of 11,791 samples are included in the training set, while the testing set comprises 1,991 samples. Moreover, we apply our scheme to solve the multiclass classification problem (Section V-B) and compare our model with state-of-the-art privacy-preserving neural networks, SecureNN [6] and Falcon [7]. The multiclass classification utilizes 60,000 images from the MNIST dataset for training and the remaining 10,000 images for testing.
Table V summarizes results of our scheme and other privacy-preserving frameworks. The computation and communication cost of our scheme goes up with an increasing number of clients. For the comparison, we consider three scenarios, with the number of clients being 10, 20, or 50. Compared with SWIFT [12], our scheme is computationally faster for clients up to 50 and has competitive communication cost for clients up to 20. In contrast, our scheme has improved communication efficiency but higher computation cost compared with Fantastic4 [13]. Our model has higher model accuracy compared with these two maliciously secure frameworks. Compared with their respective frameworks with 3 servers, the 4-server frameworks in [12], [13] have better performance in both computation and communication aspects. Given that the secure frameworks in [12], [13] rely on the honest-majority setting (where a malicious adversary can corrupt at most one server), the inclusion of an extra server imposes a more stringent security prerequisite for the successful execution of the secure frameworks. In contrast, our scheme conducts a secure logistic model that is resilient to malicious clients, utilizing only a single semi-honest server in the process. For multiclass classification, our scheme has better computation and communication performance when compared to maliciously secure neural networks under the semi-honest assumption [6], [7]. The accuracy of our secure scheme is also comparable to these two neural network models with malicious security.
Table V:
Comparison between our model and other maliciously secure frameworks using MNIST dataset
Data | Framework | Comp. | Comm. | Accuracy |
---|---|---|---|---|
MNIST (4 vs. 9) | SWIFT [12] (3PC) | 12 mins | 96.5 Mb | – |
SWIFT [12] (4PC) | 8.6 mins | 44 Mb | – | |
Fantastic4 [13] (3PC) | 8.5 s | 2.8 Gb | 96.5% | |
Fantastic4 [13] (4PC) | 3 s | 167 Mb | 96.5% | |
Our (w/o SZ, ) | 17.5 s | 204 Mb | 98.9% | |
Our (SZ, ) | 17.7 s | 29.2 Mb | 98.9% | |
Our (SZ, ) | 29.6 s | 58.4 Mb | 98.9% | |
Our (SZ, ) | 65 s | 146 Mb | 98.9% | |
MNIST | SecureNN [6] (3PC) | 1.03 hrs | 110 Gb | 93.4% |
Falcon [7] (3PC) | 33.6 mins | 88 Gb | 97.4% | |
Our (w/o SZ, ) | 17.6 mins | 1.1 Gb | 98.0% | |
Our (SZ, ) | 17.8 mins | 164 Mb | 98.0% | |
Our (SZ, ) | 18.5 mins | 328 Mb | 98.0% | |
Our (SZ, ) | 20.7 mins | 821 Mb | 98.0% |
The performance of binary classification is compared with that of SWIFT [12] and Fantastic4 [13], while the performance of multiclass classification is compared with that of SecureNN [6] and Falcon [7].
The performance statistics of SWIFT [12] are sourced from [13], whereas the statistics for the other three SMC frameworks are obtained from their respective publications.
Comp.: computation cost; Comm.: communication cost.
3PC: 3-party computation (i.e., 3 servers); 4PC: 4-party computation (i.e., 4 servers).
: No. of clients participated in our privacy-preserving scheme.
w/o SZ: without SZ compression.
VIII. Conclusion
In this paper, we introduce a maliciously secure logistic model for horizontally distributed data, utilizing a novel matrix encryption technique. Unlike state-of-the-art secure frameworks that require the participation of two or more servers in an honest-majority setting, our scheme utilizes only a single semi-honest server. Our scheme ensures that any two arbitrary records are indistinguishable through the -transformation. A verification stage can detect any deviations from the proposed scheme among malicious data providers. Lossy compression is employed to minimize the communication cost while ensuring negligible degradation in accuracy. Compared with other maliciously secure models, our scheme has higher computational and communication efficiency. One prospective avenue for future research involves expanding our secure scheme to other nonlinear models, such as support vector machine and neural network.
Acknowledgments
This work was supported by the National Institutes of Health [R01 LM014027, U24 AA029959-01]. The authors would like to thank anonymous reviewers for many helpful comments.
Appendix A. Commutative matrix
Matrix and are commutative if . To ensure negligible degradation in accuracy, the proposed privacy-preserving scheme generates commutative matrices to encrypt the plaintexts. The commutative encryption matrix is constructed based on matrix polynomial (i.e., a polynomial with matrices as variables) [61]. For instance, assume there are 2 clients in the collaborative learning. Each client first generates a common encryption key (a random nonsingular matrix). Client 1 then generates a vector of random coefficients and an encryption matrix . Similarly, client 2 generates a vector of random coefficients and an encryption matrix and are commutative (i.e., ) because and are both matrix polynomials of the common matrix .
Appendix B. Pre-processing and internal encryption for data with large sample size
To enhance the computational efficiency of encryption for clients with large sample sizes, we partition the encryption matrix into a diagonal matrix. Specifically, client generates as
(33) |
where is a random Gaussian matrix. is also partitioned into sub-matrices. Following the pre-processing procedure, a pseudo record is added to each sub-matrix to ensure indistinguishability. Specifically, (assuming that a vector of 1 is already included as the first row) is partitioned into sub-matrices , with each sub-matrix containing 99 samples and all the features. Before sub-matrices being encrypted by , client computes the Euclidean norm of each column in . Let be the Euclidean norm of the -th column in and . Client generates a vector, , with the -th element of being . is added to as the last row. Subsequently, the -th sub-matrix consists of 100 samples, and the Euclidean norm of every column equals . As the total number of samples in may not be divisible by 99, client generates a random set of pseudo records to be vertically integrated into the original matrix such that each sub-matrix contains 99 samples. For example, consider the dataset containing 1,070 samples. Client generates 19 pseudo records, which are then vertically concatenated with . Following this concatenation, can be split into 11 submatrices, i.e., . Client first concatenates with the pseudo record and then encrypts each with . After data encryption, client sends and row indices of the pseudo records to the server.
Appendix C. Logistic model estimate using encrypted and original data
Theorem 9. Data matrices and model estimates of the secure and non-secure computation have the following properties.
;
;
where .
Proof. Let
(34) |
and
(35) |
and are diagonal matrices. Let
(36) |
be the corresponding matrices in the privacy-preserving model.
In iteration (initial setup), set the starting point as for the privacy-preserving model and for the non-secure model. Because , we have
(37) |
According to Equations 34, 35 and 37, we have and (Equation 4) can be expressed as . Thus we have
(38) |
Therefore properties hold in the initial step.
Next we prove that properties hold in the -th iteration assuming that properties hold in the -th iteration. Let notations with or denote parameters derived during the -th iteration. Given is updated as
(39) |
Assuming that properties hold in the -th iteration, we have . Since , we have
(40) |
Similar to the proof for iteration , we have Enc . So properties hold in the -th iteration. Moreover, based on Equation 11, we have
(41) |
To conclude, properties hold for all iterations. □
References
- [1].Zhang Y, Yu R, Nekovee M, Liu Y, Xie S, and Gjessing S, “Cognitive machine-to-machine communications: visions and potentials for the smart grid,” IEEE Network, vol. 26, no. 3, pp. 6–13, 2012. [Google Scholar]
- [2].Zhao C, Zhao S, Zhao M, Chen Z, Gao C-Z, Li H, and an Tan Y, “Secure multi-party computation: Theory, practice and applications,” Information Sciences, vol. 476, pp. 357–372, 2019. [Google Scholar]
- [3].Li Q, Wen Z, Wu Z, Hu S, Wang N, Li Y, Liu X, and He B, “A survey on federated learning systems: Vision, hype and reality for data privacy and protection,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 4, pp. 3347–3366, 2023. [Google Scholar]
- [4].Lindell Y, “Secure multiparty computation,” Commun. ACM, vol. 64 no. 1, pp. 86–96, 2020. [Google Scholar]
- [5].Thapa C and Camtepe S, “Precision health data: Requirements, challenges and existing techniques for data security and privacy,” Computers in Biology and Medicine, vol. 129, p. 104130, 2021. [DOI] [PubMed] [Google Scholar]
- [6].Wagh S, Gupta D, and Chandran N, “SecureNN: 3-party secure computation for neural network training.” Proc. Priv. Enhancing Technol, vol. 2019, no. 3, pp. 26–49, 2019 [Google Scholar]
- [7].Wagh S, Tople S, Benhamouda F, Kushilevitz E, Mittal P, and Rabin T, “Falcon: Honest-majority maliciously secure framework for private deep learning,” Proceedings on Privacy Enhancing Technologies, vol. 2021, pp. 188–208, January 2021. [Google Scholar]
- [8].Gordon SD, Ranellucci S, and Wang X, “Secure computation with low communication from cross-checking,” in Advances in Cryptology - ASIACRYPT 2018: 24th International Conference on the Theory and Application of Cryptology and Information Security, Brisbane, QLD, Australia, December 2–6, 2018, Proceedings, Part III. Springer-Verlag, 2018, pp. 59–85. [Google Scholar]
- [9].Patra A and Suresh A, “BLAZE: blazing fast privacy-preserving machine learning,” in 27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23–26, 2020. The Internet Society, 2020. [Google Scholar]
- [10].Mohassel P and Rindal P, “ABY3: A mixed protocol framework for machine learning,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. New York, NY, USA: Association for Computing Machinery, 2018, p. 35–52. [Google Scholar]
- [11].Lehmkuhl R, Mishra P, Srinivasan A, and Popa RA, “Muse Secure inference resilient to malicious clients,” in 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, Aug. 2021, pp. 2201–2218. [Google Scholar]
- [12].Koti N, Pancholi M, Patra A, and Suresh A, “SWIFT: Super-fast and robust privacy-preserving machine learning,” in 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 2021, pp. 2651–2668. [Google Scholar]
- [13].Dalskov A, Escudero D, and Keller M, “Fantastic four: Honest-majority four-party secure computation with malicious security,” in 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 2021, pp. 2183–2200. [Google Scholar]
- [14].Byali M, Chaudhari H, Patra A, and Suresh A, “FLASH: Fast and robust framework for privacy-preserving machine learning,” Proceedings on Privacy Enhancing Technologies, vol. 2020, pp. 459–480, April 2020. [Google Scholar]
- [15].Wang X, Ranellucci S, and Katz J, “Authenticated garbling and efficient maliciously secure two-party computation,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ‘17. Association for Computing Machinery, 2017, pp. 21–37. [Google Scholar]
- [16].Wang X, Ranellucci S, and Katz J, “Global-scale secure multiparty computation,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ‘17. Association for Computing Machinery, 2017, pp. 39–56. [Google Scholar]
- [17].Zhu R, Cassel D, Sabry A, and Huang Y, “NANOPI: Extreme-scale actively-secure multi-party computation,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ‘18. Association for Computing Machinery, 2018, pp. 862–879. [Google Scholar]
- [18].Zheng W, Popa RA, Gonzalez JE, and Stoica I, “Helen: Maliciously secure coopetitive learning for linear models,” in 2019 IEEE Symposium on Security and Privacy (SP), 2019, pp. 724–738. [Google Scholar]
- [19].Fan Y, Bai J, Lei X, Zhang Y, Zhang B, Li K-C, and Tan G, “Privacy preserving based logistic regression on big data,” Journal of Network and Computer Applications, vol. 171, p. 102769, 2020. [Google Scholar]
- [20].Mohassel P and Zhang Y, “Secureml: A system for scalable privacy-preserving machine learning,” in 2017 IEEE Symposium on Security and Privacy (SP). Los Alamitos, CA, USA: IEEE Computer Society, 2017, pp. 19–38. [Google Scholar]
- [21].Patra A, Schneider T, Suresh A, and Yalame H, “ABY2.0: Improved mixed-protocol secure two-party computation,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 2165–2182. [Google Scholar]
- [22].Rivest RL, Adleman L, Dertouzos ML et al. , “On data banks and privacy homomorphisms,” Foundations of secure computation, vol. 4, no. 11, pp. 169–180, 1978. [Google Scholar]
- [23].Acar A, Aksu H, Uluagac AS, and Conti M, “A survey on homomorphic encryption schemes: Theory and implementation,” ACM Comput. Surv, vol. 51, no. 4, pp. 1–35, 2018 [Google Scholar]
- [24].Yao AC-C, “How to generate and exchange secrets,” in 27th annual symposium on foundations of computer science (Sfcs 1986). IEEE; 1986, pp. 162–167. [Google Scholar]
- [25].Goldreich O, Micali S, and Wigderson A, “How to play ANY mental game,” in Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing. Association for Computing Machinery, 1987, pp. 218–229. [Google Scholar]
- [26].Dwork C, “Differential privacy,” in Proceedings of the 33rd International Conference on Automata, Languages and Programming - Volume Part II. Springer-Verlag, 2006, pp. 1–12. [Google Scholar]
- [27].Zhu T, Ye D, Wang W, Zhou W, and Yu PS, “More than privacy: Applying differential privacy in key areas of artificial intelligence,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 6, pp. 2824–2843, 2022. [Google Scholar]
- [28].Zhao L, Wang Q, Zou Q, Zhang Y, and Chen Y, “Privacy-preserving collaborative deep learning with unreliable participants,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1486–1500, 2020. [Google Scholar]
- [29].Phan N, Vu MN, Liu Y, Jin R, Dou D, Wu X, and Thai MT, “Heterogeneous gaussian mechanism: preserving differential privacy in deep learning with provable robustness,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 2019, pp. 4753–4759. [Google Scholar]
- [30].Li W, Milletarì F, Xu D, Rieke N, Hancox J, Zhu W, Baust M, Cheng Y, Ourselin S, Cardoso MJ et al. , “Privacy-preserving federated brain tumour segmentation,” in Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 10. Springer, 2019, pp. 133–141. [Google Scholar]
- [31].Wei K, Li J, Ding M, Ma C, Yang HH, Farokhi F, Jin S, Quek TQS, and Poor HV, “Federated learning with differential privacy: Algorithms and performance analysis,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 3454–3469, 2020. [Google Scholar]
- [32].Truex S, Baracaldo N, Anwar A, Steinke T, Ludwig H, Zhang R, and Zhou Y, “A hybrid approach to privacy-preserving federated learning,” in Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, ser. AISec’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 1–11. [Google Scholar]
- [33].Jayaraman B and Evans D, “Evaluating differentially private machine learning in practice,” in 28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, 2019, pp. 1895–1912. [Google Scholar]
- [34].Bianchi T, Bioglio V, and Magli E, “Analysis of one-time random projections for privacy preserving compressed sensing,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 2, pp. 313–327, 2016. [Google Scholar]
- [35].Yu NY, “Indistinguishability and energy sensitivity of gaussian and bernoulli compressed encryption,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 7, pp. 1722–1735, 2018. [Google Scholar]
- [36].Cho W and Yu NY, “Secure and efficient compressed sensing-based encryption with sparse matrices,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1999–2011, 2020 [Google Scholar]
- [37].Kuldeep G and Zhang Q, “Design prototype and security analysis of a lightweight joint compression and encryption scheme for resource-constrained iot devices,” IEEE Internet of Things Journal, vol. 9, no. 1, pp. 165–181, 2022 [Google Scholar]
- [38].Hastie T, Tibshirani R, Friedman JH, and Friedman JH, The elements of statistical learning: data mining, inference, and prediction. Springer, 2009, vol. 2. [Google Scholar]
- [39].Maalouf M, “Logistic regression in data analysis: An overview,” Int. J. Data Anal. Tech. Strateg, vol. 3, no. 3, p. 281–299, 2011. [Google Scholar]
- [40].Katz J and Lindell Y, Introduction to Modern Cryptography, Second Edition, 2nd ed. Chapman & Hall/CRC, 2014. [Google Scholar]
- [41].He X, Machanavajjhala A, Flynn C, and Srivastava D, “Composing differential privacy and secure computation: A case study on scaling private record linkage,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ‘17. Association for Computing Machinery, 2017, p. 1389–1406 [Google Scholar]
- [42].Wang W, Ying L, and Zhang J, “On the relation between identifiability, differential privacy, and mutual-information privacy,” IEEE Transactions on Information Theory, vol. 62, no. 9, pp. 5018–5029, 2016. [Google Scholar]
- [43].Bellare M, Hoang VT, and Rogaway P, “Foundations of garbled circuits,” in Proceedings of the 2012 ACM Conference on Computer and Communications Security. New York, NY, USA: Association for Computing Machinery, 2012, pp. 784–796 [Google Scholar]
- [44].Liu C, Hu X, Chen X, Wei J, and Liu W, “SDIM: A subtly designed invertible matrix for enhanced privacy-preserving outsourcing matrix multiplication and related tasks,” IEEE Transactions on Dependable and Secure Computing, pp. 1–18, 2023 [Google Scholar]
- [45].Canetti R, “Universally composable security,” J. ACM, vol. 67, no. 5, 2020. [Google Scholar]
- [46].Gibbs AL and Su FE, “On choosing and bounding probability metrics,” International Statistical Review / Revue Internationale de Statistique, vol. 70, no. 3, pp. 419–435, 2002. [Google Scholar]
- [47].Cam LL, Asymptotic Methods in Statistical Decision Theory. Springer; New York, NY, 1986 [Google Scholar]
- [48].DasGupta A, Asymptotic Theory of Statistics and Probability. Springer; New York, NY, 2008 [Google Scholar]
- [49].Guntuboyina A, Saha S, and Schiebinger G, “Sharp inequalities for f-divergences,” IEEE Transactions on Information Theory, vol. 60, no. 1, pp. 104–121, 2014 [Google Scholar]
- [50].Kailath T, “The divergence and bhattacharyya distance measures in signal selection,” IEEE Transactions on Communication Technology, vol. 15, no. 1, pp. 52–60, 1967. [Google Scholar]
- [51].Abou-Moustafa KT and Ferrie FP, “A note on metric properties for some divergence measures: The gaussian case,” in Proceedings of the Asian Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 25. Singapore Management University, Singapore: PMLR, 2012, pp. 1–15 [Google Scholar]
- [52].Sun X, Tian C, Hu C, Tian W, Zhang H, and Yu J, “Privacy-preserving and verifiable SRC-based face recognition with cloud/edge server assistance,” Computers & Security, vol. 118, p. 102740, 2022 [Google Scholar]
- [53].Liu C, Hu X, Zhang Q, Wei J, and Liu W, “An efficient biometric identification in cloud computing with enhanced privacy security,” IEEE Access, vol. 7, pp. 105363–105375,2019 [Google Scholar]
- [54].Jasmine RM and Jasper J, “A privacy preserving based multi-biometric system for secure identification in cloud environment,” Neural Processing Letters, vol. 54, no. 1, pp. 303–325, 2022. [Google Scholar]
- [55].Di S and Cappello F, “Fast error-bounded lossy hpc data compression with sz,” in 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016, pp. 730–739. [Google Scholar]
- [56].Cappello F, Di S, Li S, Liang X, Gok AM, Tao D, Yoon CH, Wu X-C, Alexeev Y, and Chong FT, “Use cases of lossy compression for floating-point data in scientific data sets,” The International Journal of High Performance Computing Applications, vol. 33, no. 6, pp. 1201–1220,2019. [Google Scholar]
- [57].Zhao K, Di S, Lian X, Li S, Tao D, Bessac J, Chen Z, and Cappello F, “SDRBench: Scientific data reduction benchmark for lossy compressors,” in 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 2716–2724. [Google Scholar]
- [58].van Erven T and Harremos P, “Rényi divergence and kullback-leibler divergence,” IEEE Transactions on Information Theory, vol. 60, no. 7, pp. 3797–3820, 2014 [Google Scholar]
- [59].Dua D, Graff C et al. , “Uci machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
- [60].Kulesa A, Krzywinski M, Blainey P, and Altman N, “Sampling distributions and the bootstrap,” Nature Methods, vol. 12, pp. 477–478 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Hou S, Uehara T, Yiu S, Hui LC, and Chow K, “Privacy preserving confidential forensic investigation for shared or remote servers,” in 2011 Seventh International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2011, pp. 378–383 [Google Scholar]