Abstract
Machine learning and artificial neural networks (ANNs) have been at the forefront of medical research in the last few years. It is well known that ANNs benefit from big data and the collection of the data is often decentralized, meaning that it is stored in different computer systems. There is a practical need to bring the distributed data together with the purpose of training a more accurate ANN. However, the privacy concern prevents medical institutes from sharing patient data freely. Federated learning and multi-party computation have been proposed to address this concern. However, they require the medical data collectors to participate in the deep-learning computations of the data users, which is inconvenient or even infeasible in practice. In this paper, we propose to use matrix masking for privacy protection of patient data. It allows the data collectors to outsource privacy-sensitive medical data to the cloud in a masked form, and allows the data users to outsource deep learning to the cloud as well, where the ANN models can be trained directly from the masked data. Our experimental results on deep-learning models for diagnosis of Alzheimer’s disease and Parkinson’s disease show that the diagnosis accuracy of the models trained from the masked data is similar to that of the models from the original patient data.
Keywords: Neural network privacy, Orthogonal transformation, Matrix masking, Medical data privacy
I. Introduction
Artificial intelligence is transforming biomedical research and medical practice [1], [2]. Machine learning has been used for prediction, classification, and statistical inference, often outperforming human experts in diagnosis [3]. Deep-learning artificial neural networks (ANN) have become increasingly popular in medical research [4]–[10]. Studies have shown that ANNs provide better performance than conventional machine learning algorithms and linear regression methods [11]–[14].
While big medical data present unprecedented opportunities for building fine-grained ANN models for medical research and practice, the tasks of training and continuously refining ANN models with data from tens of thousands of patients, each with hundreds or thousands of attributes including possibly images and genetic data, are notoriously computation-intensive and time-consuming [15], [16]. Outsourcing such computation and the data to the cloud is an obvious solution. The problem is that exposing sensitive patient data to the cloud admins and others with access to the cloud storage (possibly through illegal means such as cyber-attacks) may lead to incompliance with the expansive laws and regulations that govern medical data privacy. Encryption could be the answer. Most prior efforts focused on using homomorphic encryption to perform inference (such as classification) based on ANN models that have already been trained [17]–[22]. The key problem of outsourcing the expensive training operations to the cloud, without leaking any patient data from their distributed sources, remains open, not only because the computation over homomorphically encrypted data is expensive and the ANN models are non-linear, but also because it is difficult to compute over medical data that are encrypted from multiple sources with different keys.
One solution of privacy-preserving deep learning over distributed data is federated learning [23]–[27] or more broadly multi-party secure computation [28]–[31]. However, they require all data collectors (called data clients in this paper) to perform synchronized computations together. Take federated learning as example. It requires all clients to synchronously train their models locally based on their raw data and send the gradients of the model parameters after each local training iteration to a centralized server (or cloud), where the gradients from all clients are combined to update the model parameters, which are then distributed to the clients in order to update their local models for the next training iteration [32]. This model requires all data clients (i.e., data-collecting doctors and medical researchers) to acquire the computing resources and the technical expertise for local deep-learning computation, which is opposite from the desire of outsourcing such computation to the cloud, as discussed earlier. Moreover, it is highly inconvenient that whenever a biostatistian wants to perform a deep-learning study, all data clients (i.e., data-collecting doctors and medical researchers) need to be summoned to participate and carry out their respective local computation.
This paper sets forth the following requirements for cloud-based ANN learning of privacy-sensitive medical data:
Outsourcing Requirement: The data clients outsource the job of ANN training to the cloud, and they are relieved from local model training. After contributing their data, they are not required to participate in the computation of future data analyses.
Data Privacy Requirement: Data should be stored at the cloud in a privacy-preserving form. If the cloud is compromised, the medical information of individuals should not be leaked.
Efficiency Requirement: The efficiency of training ANN models over the privacy-protected data should be comparable to that of training ANN models over the raw data.
Homomorphically encrypted data cannot meet the efficiency requirement and the outsourcing requirement because of the difficulty of computing over data encrypted by different clients with different keys. This paper proposes a new method of cloud-based ANN training with masked data. Matrix masking, formulated by [33]–[39] with provable privacy, alters the components of a data matrix by performing orthogonal transformations. It has been used in privacy-preserving statistical analyses including linear regression, contingency table analysis, Cox proportional hazard regression, and logistic regression. To the best of our knowledge, this paper is the first to investigate the use of matrix masking as an effective privacy-preserving technique for cloud-based ANN training. We show theoretically that the difference between the cost function of the ANN model trained over the masked data and the cost function of the model over the raw data is bounded by a constant. We perform experiments based on two real data sets, one for Alzheimer’s disease [40] and one for Parkinson’s disease [41], to train diagnosis ANN models over masked data on the cloud and compare them with the benchmark models trained from the raw data. The diagnosis performance of the models trained with the masked data was similar to the performance of those trained with the raw data. Moreover, the runtime of training ANN models over the masked data is similar to doing so over the raw data, while data masking incurs small computational overhead.
II. Related Work
Various methods have been proposed for multiple parties to pool their data together for deep learning while preserving data privacy. They can generally be grouped into three categories.
Privacy-preserving inference: While the ANN model is trained with raw data, the inference uses encrypted data, such that a user can use the model for inference without leaking its input data.
Federated learning: The ANN is trained jointly from multiple clients by sending their local parameters, such as gradients, to the cloud, which synthesizes the local values and sends the results back.
ANN training on encrypted data with a trust authority: The ANN is trained on data encrypted with credentials from a trust central server.
The privacy-preserving inference is typically performed over homomorphically encrypted user input, and the result (in the encrypted form) is sent back to the user, which decrypts for the actual model output [17]–[22]. These studies do not consider training the ANN models on encrypted data. Nor do they meet our efficiency requirement.
In federated learning [23]–[27], the data clients perform local ANN training themselves, which does not meet our outsourcing requirement. Moreover, all clients have to perform training together, which may be inconvenient in practical settings when a large global consortium of medical researchers who share data may work at different schedules on different research works with different expertise. The same issues exist for multi-party computation and garbled circuits [28]–[31], where the data clients do not outsource their local computations that often incur high communication and computation overhead [42].
CryptoNN by [43] relies on a centralized trusted authority (TA) to distribute its public key to all clients for encrypting their data. Establishing and maintaining a TA is not always feasible in practice. If the TA is compromised, all data may be leaked. In addition, training over encrypted data is far more expensive than doing so over the raw data.
Outsourcing the training operations to the cloud, without leaking any raw data, remains an open problem under the three requirements listed in the introduction.
III. Preliminaries
A. System Model
Consider an example of collecting the Alzheimer’s disease (AD) patient data from a consortium of medical centers, hospitals and clinics in order to build deep-learning diagnosis models that are trained based on the collected patient data and are used by taking new patient information as input and classifying new patients as having AD or not. There is extensive work on building such a model based on a centralized data set [44]. However, if the data is scattered at different medical centers/hospitals/clinics and cannot be directly shared due to patient privacy restrictions, the prior work based on a centralized data set cannot be applied.
We assume that not all doctors and medical researchers in the consortium have access to adequate computing resources or deep-learning expertise to carry out local ANN model training or other multi-party computation for federated learning. It is desired to outsource data and computation and to leverage software and hardware from an MLaaS (Machine Learning as a Service) cloud provider, such as [45], [46] or [47]. We assume that the data collection process spans across long time (e.g., multiple years) and globally, which makes it difficult to coordinate synchronized computations or establish a centralized trusted authority.
The data contributors such as the doctors/researchers at the medical centers/hospitals/clinics in the consortium of the above example are called the data clients. The computing/storage platform that the clients outsource their data to is referred to as the cloud in this paper. The biostatisticians that use all or part of the outsourced data to learn ANN models on the cloud are called the data users. Different users may choose to use different subsets of data (e.g., from different countries) or choose different attributes of the data (e.g., with or without MRI images) to train different models. The doctors that use the models to diagnose new patients are called the model users. The sets of data clients, data users and model users may overlap. For example, a team in a hospital may have members that collect data, members that build models, and other members that use the models to diagnose.
B. Problem Statement
The problem we address in this paper is to outsource distributed patient data from multiple data clients to the cloud, where deep-learning models are trained from the outsourced data for disease diagnosis (for example, AD diagnosis), with three requirements: (1) Data is outsourced to the cloud in a privacy-preserving form such that raw data about individual patients will not be leaked even if access to the cloud storage is compromised. (2) All model training operations are performed by the cloud. The inference operations (actual diagnosis) may be performed by the model users if they download the models from the cloud. The inference accuracy of the models should be close to the accuracy of the benchmark models trained from the raw data directly. (3) The privacy-preserving model training should be efficient (or even comparable to training with raw data).
C. Threat Model
Compromised cloud.
We assume that the cloud cannot be fully trusted as its admins have access to the data, other internal threats to the data cannot be ignored, and its storage could be compromised by an outside adversary. The patient data are outsourced from the data clients to the cloud in a privacy-preserving form. Even if an adversary obtains the sourced data, it should not be able to learn the raw data of individual patients.
Curious clients.
Other clients may be curious about learning the raw data from a target client. If clients must cooperate in transform their data together in a privacy-preserving form before outsourcing to the cloud, we must make sure that the intermediate results of this multi-party computation do not give any clue to the raw data. If each client independently produces its outsourced data based entirely on local secrets, then the threat from curious clients is minimized.
Model poisoning.
A malicious client could potentially poison a deep-learning model with incorrect input. That could result in the model parameters being erroneously altered and consequently the model being less accurate.
IV. Training Artificial Neural Network on Masked Data
We propose to outsource matrix-masked data from the clients to the cloud. To motivate for our solution, we give a toy example. Consider a patient record, which is a vector with a certain number of attribute values:
where the first attribute is the patient’s age in years, the second attribute is the height in meters, and the third attribute is the weight in kilograms. If we scale this vector by a random factor (e.g. a = 5), then the input vector becomes
On the one hand, the masked patient record does not have a direct meaning to an adversary that might observe it (e.g. no human reaches the age of 350 years). On the other hand, an ANN is able to extract the same features from x or ax as scaling the vector does not change the relationships between its attributes.
However, for simple scaling, it is often possible for the adversary to guess the scaling factor a and recover approximate values of the elements in vector x, particularly when some attributes are restricted in their value ranges.
We need a more secure way of preserving the relationship between the attributes and a more robust way against guessing. Matrix masking provides a solution to this problem. It operates on many patient records (denoted as X) together, where X is a matrix whose rows are patient records and columns are attributes. The multiplication of X by an orthogonal matrix A, i.e., AX, preserves the angles and distances of the column vectors (attributes) in X between vectors. AX keeps the first and second moment statistics of X, providing adequate information for ANN model training. A random orthogonal matrix alters the values in each patient record, while preserving the relationship between the attributes. A large orthogonal matrix is practically impossible to guess as has been proven in [48] and [49], which showed that large random matrix masking AX is computationally secure in protecting the privacy of data X.
We now present how each data client i performs random orthogonal matrix masking to its patient data [Xi, yi], before sending the masked data to the cloud, where Xi is the matrix of patient records and yi is the response vector — using Alzheimer’s disease (AD) as example, for each patient in Xi, the corresponding element in yi is zero for non-AD and one for AD. We stress that data masking is done once by the client before the masked data is outsourced to the cloud. After that, the data client is not involved when any data user uses the data to train an ANN model. Let ni be the number of patient records, i.e., the number of rows in Xi. The process for generating a random orthogonal matrix Ai is as follows: First, using the Gram-Schmidt process, we find an orthogonal basis Bi of the vector space generated by the vectors we want to keep invariant. Specifically, we want to keep invariant the vector of yi and the vector of ones. Hence, Bi has two column vectors. We then randomly generate ni − 2 column vectors, concatenate them to Bi for an nixni matrix Ci, and perform QR factorization on Ci, which gives us two orthogonal matrices, Q1 and Q2 [36]. We let . Therefore, Ai is a matrix that can perform orthogonal transformations to the patient data [Xi, yi] and also has the properties Aiyi = yi and Ai1n = 1n, where 1n is a vector of n ones. Note that each client can generate its own data masking matrix Ai. The clients do so independently of each other.
Each client sends their masked data set [AiXi, yi] to the cloud, which aggregates them together by stacking them vertically. For m data clients, the cloud gathers
| (1) |
where
| (2) |
Any data user can now use all or a portion of [AX, y] to train an ANN model (such as feed-forward neural networks in our experiments) in the cloud. For example, one may use subsets of attributes to build and compare models to learn each attribute relevance to inference accuracy. After a model is trained, the user downloads the model and apply it to new patient records (which are not masked). The user may also perform privacy-preserving inference in the cloud [17]–[22].
Will the model trained from the masked data be similar to the model trained from the raw patient data? We provide our intuition below, with theoretical and experimental results later. Consider the raw data matrix Xi at the ith data client with ni patient records and the random mean-invariant orthogonal masking matrix Ai. Each row in the masked data Mi = AiXi, called a masked record, is a weighted linear sum of all patient records in Xi, where Ai specifies the weights — its ith column specifies the weights for the ith masked record in Mi. Every patient record contributes a share (fraction) of itself to every masked record. The ith row in Ai specifies the shares (fractions) of the ith patient record in Xi that are contributed to the masked records in Mi. Because the sum of each row in Ai is one, all the fractions of a patient record contributed to Mi adds up to its whole. In that sense, Mi carries the same amount of patient information as Xi does, albeit in a random mixed form. When we feed a row vector from Mi to a neural network, we are actually feeding ni fractional raw patient records simultaneously. If the neural network were entirely linear and treated each input sample independently, we would see the same model parameters trained either from Xi or from Mi. A feed-forward neural network treats input samples independently, but its activation functions are not linear. Due to this non-linearity, the model parameters trained from Mi may not be identical to those from Xi, depending on the choice of activation function. But they could very well be similar, as our experiments have clearly suggested.
To guard against model poisoning where a malicious attempts to poison a deep-learning model with incorrect patient data, we introduce a data verification mechanism to each client’s masked data. Consider the masked data Mi from an arbitrary client. First, we build a model with the masked data from all other clients, and test the model’s accuracy with a test data set (which could be de-identified raw data from a source that allows the outsourcing). Second, we build another model with the masked data from all clients (including Mi), and test the model’s accuracy. If the model with Mi is significantly worse than the model without Mi, we reject Mi.
V. Model from Masked Data v.s. Model from Raw Data
It has been proved in [48], [49] that the masked data can achieve strong privacy without leaking information about individual patients in the raw data. More specifically, suppose an adversary has obtained the masked data M and let X be a random variable for any raw data that can produce the masked data. Under certain easy-to-satisfy conditions, the restricted support of X will be practically intractable and that its posterior probability density will remain the same as its prior probability density over the restricted support, which means that the information learned from the masked data does not improve the knowledge of the adversary about which possible values of the patient records are more likely.
Below we show that masking does not significantly change the utility of the data in ANN model training. More specifically, we want to show that the ANN model N that is trained with the raw data is similar to the ANN model trained with the masked data. This property comes from the fact that the difference between the cost functions of N and is bounded by a constant, while our experiments will demonstrate that the bound is tight as the two models perform similarly. The proof of the theorem below can be found in [50].
Theorem V.1.
Let X be a full-rank matrix and y be a response vector in the data set [X, y]. Let be an orthogonal matrix that satisfies Ay = y and , where is the transpose of a vector of n ones. Let C and be the cost of the ANN after feeding the matrix [X, y] and A[X, y], respectively. Then, the difference of C and is bounded, i.e., , where K is a constant depending on the raw data and the masking matrix, and | · | is the element-wise absolute value function.
VI. Experimental Evaluation
We conduct extensive experiments to compare the accuracy of a model that has been trained on raw data to the accuracy of a model that has been trained on masked data. We use the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data set [40] and the Parkinson’s Outcome Project (POP) data set [41]. In this section, we first present the data sets and the experimental setup, including the experimental method and the ANN hyper-parameters. Afterwards, we demonstrate the experimental results for binary classification.
A. Data Sets
We use two separate data sets for the experiments. The first one is the Alzheimer’s Disease (AD) data set [40], which has 4 attributes in matrix form and MRI images of the brain of the patients in image form. The 4 attributes in the matrix form are the following: age in years, gender with 0 for males and 1 for females, MMSE score (Mini-Mental State Examination) in the range [18, 30] and APOE (Apolipoprotein E gene) with 0 for non-existence of the gene and 1 for existence. The distributions of the attributes are shown in Figures 1 – 4, and the distribution of the response variable is shown in Figure 5, where AD denotes Alzheimer’s disease and NL denotes normal cognition. The MRI images are Axial PD/T2 FSE images of the brain of the patients as shown in Figures 6 and 7. We use the min-max normalization method for this data set as well.
Fig. 1:

The age attribute distribution of the ADNI data set.
Fig. 4:

The sex attribute distribution of the ADNI data set.
Fig. 5:

The status distribution of the ADNI data set.
Fig. 6:

MRI images of two patients without AD.
Fig. 7:

MRI images of two patients clinically diagnosed with AD.
Additionally, we perform experiments on the combination of matrix and image data of the AD data set. The purpose of this experiment is to show that our method can work with image data as well as matrix data and that their combination can produce a more accurate ANN than each data set separately. We transform the image data into matrix form before combining them with the non-image data.
The second data set is based on the Parkinson’s Outcome Project (POP) [41]. We have 4908 patients with 13 attributes extracted from their POP Data Collection Form and we generate a data matrix, where each patient’s record forms a row with 13 columns. We use the Hoehn and Yahr stage (HYstage) as the response, which is a categorical variable between 1 and 5, showing how much the disease has progressed. Since we perform binary classification, we dichotomize this variable by transforming it into a binary attribute where the response is 0 when the HYStage is 1 or 2 (i.e. mild cases of PD) and the response is 1 when the HYStage is 3, 4 and 5 (i.e. severe cases of PD).
After extracting the response vector, there are 12 attributes, which are either categorical or numerical. Some of the attributes are age, sex, ethnicity, disease duration, cognitive score and others. For each categorical attribute, we transform it into one or more attributes with binary values (0 or 1) with one-hot encoding. More specifically, for a categorical attribute with only 2 categories, we transform it into a binary attribute where 0 stands for the first category and 1 stands for the second category; for a categorical attribute which has more than 2 categories, we transform it into p binary attributes where the ith attribute is 1 when and only when it belongs in the ith category. With this transformation, the 12 attributes are extended to 36 attributes. It is well known that one-hot encoding can provide better linearity and improve the network’s accuracy. We observed higher accuracy by using one-hot encoding as well. Finally, we use the min-max method to normalize each attribute into the range [0, 1], which also improves linearity.
We split the data set into two sub-sets, where the first contains approximately 80% of the data for training and the second contains 20% of the data for testing. The training data set contains 3908 rows, from which 1954 have a response of 0 and 1954 have a response of 1. The test data set contains 1000 rows from which 500 have a response of 0 and 500 have a response of 1.
B. Experimental Method
We use a powerful server equipped with an 8-core CPU at 3.7GHz, 16GB of RAM and a 2080TI GPU, to simulate the cloud, which can train an ANN model in a timely manner, due to the powerful GPU. We also simulate several common desktops to act as the clients. Each client has a training data set and a testing data set, which are part of the data sets in Section VI–A.
In our experiments, we compare the accuracy of the trained ANN using raw and masked data. We first let the client send the raw training data to the server and, in turn, the server returns the raw-trained model to the client, which is denoted as M. Then, we let the client mask the training data using the proposed method, send it to the sever and the server returns the masked-trained model to the client, which is denoted as . Finally, we test and compare the accuracy of the two trained models using the same raw testing data at the client end. The aforementioned results are presented in Figures 8 and 9 for the ADNI and PD data sets, respectively.
Fig. 8:

Model accuracy w.r.t. number of clients in the ADNI data set.
Fig. 9:

Model accuracy w.r.t. number of clients in the PD data set.
C. Artificial Neural Network Architecture
We use a basic feed-forward neural network (FFNN) in our experiments with binary crossentropy loss function. Additionally, we use the “adam optimizer” in keras library of python as the stochastic gradient descent optimizer for both training and testing, ReLU as the activation functions for all hidden layers and sigmoid as the activation function for the output layer. We run the experiments for 400 epochs and use batch size of 50. The number of the neurons in each layer depends on the input vector size. More specifically, the ANN for the AD data set has three hidden layers. The number of neurons in each hidden layers is 4 for the non-image data, 117 for the MRI image data and 121 for their combination. The PD data set has 36 neurons in each of the three hidden layers. We vary the number of clients for accuracy comparison under these settings.
D. Results for Binary Classification
The response vector is binary, that is, it only has values 0 or 1 as explained in Section VI–A. We vary the number client machines between 1 and 10 for the AD data set, having 35 patients each. We also vary the number of clients between 1 and 78 for the PD data set, having 50 patients each. From Figures 8 and 9 we can make two significant observations. First, the accuracy increases monotonically with the increase of the number of clients. Second, the accuracy of the masked-trained model is comparable to the accuracy of the raw-trained model M.
E. Results for Attribute Combination
In order to test whether our method works with image data and with combinations of attributes, we transform the brain MRI images of the patients into matrices and combine them with the existing four non-image attributes.
Figure 10 shows the performance of the ANN models with the ADNI data set on raw and masked data, without combining image and non-image data. On the contrary, Figure 11 shows the performance of the ANN with the same data set when we combine the image and matrix data in various configurations.
Fig. 10:

Raw and masked data ANN models without attribute combination.
Fig. 11:

Raw and masked data ANN models with attribute combination.
F. Run Time of Masking
The masking process requires the generation of a masking matrix A in each client, which is explained in Section IV. This process has the following two elements: 1) perform 2 QR decompositions with complexity and 2) perform a matrix multiplication between the generated random orthogonal matrix A and the data set [Xr, yr], which has complexity as well. Therefore, the overall complexity of matrix masking is .
Although the runtime is exponential, Figure 12 shows that for data sets with hundreds or a few thousand records at each client, the masking operation is negligible compared to the training operation (e.g. a data set with 500 records requires only 48 milliseconds for mask creation and multiplication without the use of a GPU). In this experiment, we use mini-batches of size 50 and 400 epochs.
Fig. 12:

Comparison between the runtime of the training process and the masking process.
For larger data sets, the runtime complexity can be reduced with the use of block-diagonal masks. For example, instead of creating a 5000 × 5000 masking matrix, we can create 10 matrices of size 500 × 500, place them in the diagonal of the 5000 × 5000 matrix and fill the rest of the elements with zeros, i.e.,
That would reduce the masking time of a data set with 5000 records from 11.1 seconds to 0.48 seconds, since we only need to create 10 matrices of size 500 × 500 and each such mask takes only 48 milliseconds to create. Note that we used a CPU for matrix masking and a GPU for ANN training (which is significantly faster than a CPU). When a CPU is used for both operations, the masking time becomes even more negligible, since the ANN training would take considerably longer.
VII. Conclusion and Future Work
In this paper we presented a new technique for outsourcing privacy preserving ANN training. We set forth four requirements that we believe to be important in order to create a practical system. Our technique is based on matrix masking, which performs orthogonal transformations to the data before it is sent from the clients to the cloud. The cloud trains an ANN model on the aggregate masked data it has received from the clients and gives it to all the clients to be used with new raw data. In this work, we argue theoretically and experimentally that the accuracy the clients will get from the masked-trained ANN model is comparable to the accuracy they would get if the model was trained on raw data.
In the future, we plan to continue this work by exploring the capabilities of our technique on image classification (such as the MNIST data set) and Convolutional Neural Networks. We also plan to expand our work on regression and unsupervised learning.
Fig. 2:

The APOE attribute distribution of the ADNI data set.
Fig. 3:

The MMSE attribute distribution of the ADNI data set.
Contributor Information
Dimitrios Melissourgos, University of Florida, Gainesville, FL, USA.
Hanzhi Gao, University of Florida, Gainesville, FL, USA.
Chaoyi Ma, University of Florida, Gainesville, FL, USA.
Shigang Chen, University of Florida, Gainesville, FL, USA.
Samuel S. Wu, University of Florida, Gainesville, FL, USA
References
- [1].Ahuja Abhimanyu S. The impact of artificial intelligence in medicine on the future role of the physician. PeerJ, 7:e7702, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Mehta Nishita, Pandit Anil, and Shukla Sharvari. Transforming healthcare with big data analytics and artificial intelligence: A systematic mapping study. Journal of biomedical informatics, 100:103311, 2019. [DOI] [PubMed] [Google Scholar]
- [3].Irvin Jeremy, Rajpurkar Pranav, Ko Michael, Yu Yifan, Silviana Ciurea-Ilcus Chris Chute, Marklund Henrik, Haghgoo Behzad, Ball Robyn, Shpanskaya Katie, et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 590–597, 2019. [Google Scholar]
- [4].Shahid Nida, Rappon Tim, and Berta Whitney. Applications of artificial neural networks in health care organizational decision-making: A scoping review. PloS one, 14(2):e0212356, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Sheikhtaheri Abbas, Sadoughi Farahnaz, and Zahra Hashemi Dehaghi. Developing and using expert systems and neural networks in medicine: a review on benefits and challenges. Journal of medical systems, 38(9):1–6, 2014. [DOI] [PubMed] [Google Scholar]
- [6].Jiang Fei, Jiang Yong, Zhi Hui, Dong Yi, Li Hao, Ma Sufeng, Wang Yilong, Dong Qiang, Shen Haipeng, and Wang Yongjun. Artificial intelligence in healthcare: past, present and future. Stroke and vascular neurology, 2(4), 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Ahmed Sk Saddam, Dey Nilanjan, Ashour Amira S, Sifaki-Pistolla Dimitra, Bălas-Timar Dana, Balas Valentina E, and Tavares João Manuel RS. Effect of fuzzy partitioning in crohn’s disease classification: a neuro-fuzzy-based approach. Medical & biological engineering & computing, 55(1):101–115, 2017. [DOI] [PubMed] [Google Scholar]
- [8].Chong Alain Yee-Loong, Liu Martin J, Luo Jun, and Keng-Boon Ooi. Predicting rfid adoption in healthcare supply chain from the perspectives of users. International Journal of Production Economics, 159:66–75, 2015. [Google Scholar]
- [9].Elveren Erhan and Yumuşak Nejat. Tuberculosis disease diagnosis using artificial neural network trained with genetic algorithm. Journal of medical systems, 35(3):329–332, 2011. [DOI] [PubMed] [Google Scholar]
- [10].Benali Radhwane, Reguig Fethi Bereksi, and Slimane Zinedine Hadj. Automatic classification of heartbeats using wavelet neural network. Journal of medical systems, 36(2):883–892, 2012. [DOI] [PubMed] [Google Scholar]
- [11].Niu Wen-Jing, Feng Zhong-Kai, Feng Bao-Fei, Min Yao-Wu, Cheng Chun-Tian, and Zhou Jian-Zhong. Comparison of multiple linear regression, artificial neural network, extreme learning machine, and support vector machine in deriving operation rule of hydropower reservoir. Water, 11(1):88, 2019. [Google Scholar]
- [12].Süt Necdet and Şenocak Mustafa. Assessment of the performances of multilayer perceptron neural networks in comparison with recurrent neural networks and two statistical methods for diagnosing coronary artery disease. Expert Systems, 24(3):131–142, 2007. [Google Scholar]
- [13].Kim Gwang-Hee, Shin Jae-Min, Kim Sangyong, and Shin Yoonseok. Comparison of school building construction costs estimation methods using regression analysis, neural network, and support vector machine. 2013. [Google Scholar]
- [14].Han In-Su and Chung Chang-Bock. Performance prediction and analysis of a pem fuel cell operating on pure oxygen using data-driven models: A comparison of artificial neural network and support vector machine. International Journal of Hydrogen Energy, 41(24):10202–10211, 2016. [Google Scholar]
- [15].Van Grinsven Mark JJP, van Ginneken Bram, Hoyng Carel B, Theelen Thomas, and Sánchez Clara I. Fast convolutional neural network training using selective data sampling: Application to hemorrhage detection in color fundus images. IEEE transactions on medical imaging, 35(5):1273–1284, 2016. [DOI] [PubMed] [Google Scholar]
- [16].Adlers Jacob and Pihl Gustaf. Prediction of training time for deep neural networks in tensorflow, 2018.
- [17].Bost Raphael, Raluca Ada Popa Stephen Tu, and Goldwasser Shafi. Machine learning classification over encrypted data. In NDSS, volume 4324, page 4325, 2015. [Google Scholar]
- [18].Gilad-Bachrach Ran, Dowlin Nathan, Laine Kim, Lauter Kristin, Naehrig Michael, and Wernsing John. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning, pages 201–210. PMLR, 2016. [Google Scholar]
- [19].Graepel Thore, Lauter Kristin, and Naehrig Michael. Ml confidential: Machine learning on encrypted data. In International Conference on Information Security and Cryptology, pages 1–21. Springer, 2012. [Google Scholar]
- [20].Hesamifard Ehsan, Takabi Hassan, and Ghasemi Mehdi. Cryptodl: Deep neural networks over encrypted data. arXiv preprint arXiv:1711.05189, 2017. [Google Scholar]
- [21].Chabanne Hervé, de Wargny Amaury, Milgram Jonathan, Morel Constance, and Prouff Emmanuel. Privacy-preserving classification on deep neural network. IACR Cryptol. ePrint Arch., 2017:35, 2017. [Google Scholar]
- [22].Jiang Xiaoqian, Kim Miran, Lauter Kristin, and Song Yongsoo. Secure outsourced matrix computation and application to neural networks. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 1209–1222, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Konečnỳ Jakub, McMahan H Brendan, Yu Felix X, Richtárik Peter, Suresh Ananda Theertha, and Bacon Dave. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016. [Google Scholar]
- [24].Zhao Yue, Li Meng, Lai Liangzhen, Suda Naveen, Civin Damon, and Chandra Vikas. Federated learning with non-iid data. arXiv preprint arXiv:1806.00582, 2018. [Google Scholar]
- [25].Bonawitz Keith, Eichner Hubert, Grieskamp Wolfgang, Huba Dzmitry, Ingerman Alex, Ivanov Vladimir, Kiddon Chloe, Jakub Konečnỳ Stefano Mazzocchi, McMahan H Brendan, et al. Towards federated learning at scale: System design. arXiv preprint arXiv:1902.01046, 2019. [Google Scholar]
- [26].Kairouz Peter, McMahan H Brendan, Avent Brendan, Bellet Aurélien, Bennis Mehdi, Bhagoji Arjun Nitin, Bonawitz Keith, Charles Zachary, Cormode Graham, Cummings Rachel, et al. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977, 2019. [Google Scholar]
- [27].Li Tian, Sahu Anit Kumar, Talwalkar Ameet, and Smith Virginia. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3):50–60, 2020. [Google Scholar]
- [28].Liu Jian, Juuti Mika, Lu Yao, and Asokan Nadarajah. Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 619–631, 2017. [Google Scholar]
- [29].Mohassel Payman and Zhang Yupeng. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP), pages 19–38. IEEE, 2017. [Google Scholar]
- [30].Rouhani Bita Darvish, Riazi M Sadegh, and Koushanfar Farinaz. Deepsecure: Scalable provably-secure deep learning. In Proceedings of the 55th Annual Design Automation Conference, pages 1–6, 2018. [Google Scholar]
- [31].Riazi M Sadegh, Weinert Christian, Tkachenko Oleksandr, Songhori Ebrahim M, Schneider Thomas, and Koushanfar Farinaz. Chameleon: A hybrid secure computation framework for machine learning applications. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pages 707–721, 2018. [Google Scholar]
- [32].Shokri Reza and Shmatikov Vitaly. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pages 1310–1321, 2015. [Google Scholar]
- [33].Duncan George T, Pearson Robert W, et al. Enhancing access to microdata while protecting confidentiality: Prospects for the future. Statistical Science, 6(3):219–232, 1991. [Google Scholar]
- [34].Ting Daniel, Stephen E Fienberg, and Mario Trottini. Random orthogonal matrix masking methodology for microdata release. International Journal of Information and Computer Security, 2(1):86–105, 2008. [Google Scholar]
- [35].Wu Samuel S, Chen Shigang, Burr Deborah, and Zhang Long. New technologies for privacy protection in data collection and analysis. In Joint Statistical Meetings 2014, pages 2444–2458, 2014. [Google Scholar]
- [36].Wu Samuel S, Chen Shigang, Burr Deborah, and Zhang Long. A new data collection technique for preserving privacy. The Journal of privacy and confidentiality, 7(3):99, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Pei Qinglin, Chen Shigang, Xiao Yao, and Wu Samuel S. Applying triple-matrix masking for privacy preserving data collection and sharing in hiv studies. Current HIV research, 14(2):121–129, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Wu Samuel S, Chen Shigang, Bhattacharjee Abhishek, and He Ying. Collusion resistant multi-matrix masking for privacy-preserving data collection. In 2017 IEEE 3rd international conference on big data security on cloud (bigdatasecurity), IEEE international conference on high performance and smart computing (HPSC), and IEEE international conference on intelligent data and security (ids), pages 1–7. IEEE, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Zhou You, Zhou Yian, Chen Shigang, and Wu Samuel S. Achieving strong privacy in online survey. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 710–719. IEEE, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Qiu Shangran, Joshi Prajakta S, Miller Matthew I, Xue Chonghua, Zhou Xiao, Karjadi Cody, Chang Gary H, Joshi Anant S, Dwyer Brigid, Zhu Shuhan, et al. Development and validation of an interpretable deep learning framework for alzheimer’s disease classification. Brain, 143(6):1920–1933, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Parkinsons Foundation. Parkinson’s outcomes project. https://www.parkinson.org/research/Parkinsons-Outcomes-Project, 2021. [Online; Accessed: 2021-05-15].
- [42].Yang Zhiqiang, Zhong Sheng, and Wright Rebecca N. Privacy-preserving classification of customer data without loss of accuracy. In Proceedings of the 2005 SIAM International Conference on Data Mining, pages 92–102. SIAM, 2005. [Google Scholar]
- [43].Xu Runhua, Joshi James BD, and Li Chao. Cryptonn: Training neural networks over encrypted data. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pages 1199–1209. IEEE, 2019. [Google Scholar]
- [44].Zhang Li, Wang Mingliang, Liu Mingxia, and Zhang Daoqiang. A survey on deep learning for neuroimaging-based brain disorder analysis. Journal of Frontiers in Neuroscience, 14, October 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Microsoft Azure. Microsoft azure machine learning. https://azure.microsoft.com/en-us/services/machine-learning, 2021. [Online; Accessed: 2021-05-15].
- [46].Amazon Web Services. Machine learning on amazon web services. https://aws.amazon.com/machine-learning, 2021. [Online; Accessed: 2021-05-15].
- [47].Google. Google ai platform. https://cloud.google.com/ai-platform, 2021. [Online; Accessed: 2021-05-15].
- [48].Zhang Long. On security properties of Random Matrix Masking. PhD thesis, University of Florida, 2014. [Google Scholar]
- [49].Aidong Adam Ding Guanhong Miao, and Samuel Shangwu Wu. On the privacy and utility properties of triple matrix-masking. Journal of Privacy and Confidentiality, 10(2), 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Melissourgos Dimitrios, Gao Hanzhi, Ma Chaoyi, Chen Shigang, and Wu Sam. On outsourcing artificial neural network learning of privacy-sensitive medical data to the cloud. https://www.cise.ufl.edu/~sgchen/Publications/tech.pdf, 2021. [DOI] [PMC free article] [PubMed]
