Deceptive Information Retrieval

Sajani Vithana; Sennur Ulukus

doi:10.3390/e26030244

. 2024 Mar 10;26(3):244. doi: 10.3390/e26030244

Deceptive Information Retrieval

Sajani Vithana ¹, Sennur Ulukus ^1,^*

Editor: Boris Ryabko¹

PMCID: PMC10968959 PMID: 38539757

Abstract

We introduce the problem of deceptive information retrieval (DIR), in which a user wishes to download a required file out of multiple independent files stored in a system of databases while deceiving the databases by making the databases’ predictions on the user-required file index incorrect with high probability. Conceptually, DIR is an extension of private information retrieval (PIR). In PIR, a user downloads a required file without revealing its index to any of the databases. The metric of deception is defined as the probability of error of databases’ prediction on the user-required file, minus the corresponding probability of error in PIR. The problem is defined on time-sensitive data that keep updating from time to time. In the proposed scheme, the user deceives the databases by sending real queries to download the required file at the time of the requirement and dummy queries at multiple distinct future time instances to manipulate the probabilities of sending each query for each file requirement, using which the databases’ make the predictions on the user-required file index. The proposed DIR scheme is based on a capacity achieving probabilistic PIR scheme, and achieves rates lower than the PIR capacity due to the additional downloads made to deceive the databases. When the required level of deception is zero, the proposed scheme achieves the PIR capacity.

Keywords: deception, information retrieval, probabilistic schemes

1. Introduction

Information is generally retrieved from a data storage system by directly requesting what is required. This is the most efficient form of information retrieval in terms of the download cost, as the user only downloads exactly what is required. However, if the user does not want to reveal the required information to the data storage system from which the information is retrieved, extra information must be requested to increase the uncertainty of the databases’ knowledge on the user’s requirement. This is the core idea of private information retrieval (PIR) [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], where the user downloads a required file out of K independent files stored in N non-colluding databases without revealing the required file index. In PIR, the databases’ prediction of the user-required file based on the received queries is uniformly distributed across all files. Hence, the probability of error of the databases’ predictions in a PIR setting with K files is $1 - \frac{1}{K}$ . In weakly private information retrieval [16,17], a certain amount of information on the user-required file index is revealed to the databases to reduce the download cost. In such cases, as the databases have more information on the file index that the user requests, the error probability of the databases’ prediction is less than $1 - \frac{1}{K}$ . In this work, we study the case where the error probability of databases’ prediction is larger than $1 - \frac{1}{K}$ .

Note that with no information received by the user at all, the databases can make a random guess on the user-required file index, and reach an error probability of $1 - \frac{1}{K}$ . Therefore, to result in a prediction error that is larger than $1 - \frac{1}{K}$ , the user has to deceive the databases by sending fake information on the required file index. The goal of this work is to generate a scheme that allows a user to download a required file k, while forcing the databases’ prediction on the user-required file index to be ℓ, where $k \neq ℓ$ , for as many cases as possible. This is coined as deceptive information retrieval (DIR). DIR is achieved by sending dummy queries to databases to manipulate the probabilities of sending each query for each file requirement, which results in incorrect predictions at the databases. However, sending dummy queries increases the download cost compared to PIR. Figure 1 shows the behavior of the prediction error probability and the corresponding download costs for different types of information retrieval. (The regions marked as “weakly PIR” and “DIR” in Figure 1 show the points that are conceptually valid for the two cases, and this does not imply that every point in those regions is achievable. The achievable points corresponding to “weakly PIR”, and “DIR” lies within the marked regions.)

Download costs and prediction error probabilities for different types of information retrieval.

The concept of deception has been studied as a tool for cyber defense [18,19,20,21,22], where the servers deceive attackers, adversaries, and eavesdroppers to eliminate any harmful activities. In all such cases, the deceiver (servers in this case) gains nothing from the deceived, i.e., attackers, adversaries, and eavesdroppers. In contrast, the main challenge in DIR is that what needs to be deceived is the same source of information that the user retrieves the required data from. This limits the freedom that a DIR scheme could employ to deceive the databases. To this end, we formulate the problem of DIR based on the key concepts used in PIR, while also incorporating a time dimension to aid deception.

The problem of DIR introduced in this paper considers a system of non-colluding databases storing K independent files that are time-sensitive, i.e., files that keep updating from time to time. We assume that the databases only store the latest version of the files. A given user wants to download arbitrary files at arbitrary time instances. The correctness condition ensures that the user receives the required file, right at the time of the requirement, while the condition for deception requires the databases’ prediction on the user-required file to be incorrect with a probability that is greater than $1 - \frac{1}{K}$ , specified by the predetermined level of deception required in the system.

The scheme that we propose for DIR deceives the databases by sending dummy queries to the databases for each file requirement, at distinct time instances. From the user’s perspective, each query is designed to play two roles as real and dummy queries, with two different probability distributions. This allows the user to manipulate the overall probability of sending each query for each message requirement, which is known by the databases. The databases make predictions based on the received queries and the globally known probability distribution of the queries used for each file requirement. These predictions are incorrect with probability $> 1 - \frac{1}{K}$ as the probability distributions based on which the real queries sent are different from the globally known overall distribution. This is the basic idea used in the proposed scheme, which allows a user to deceive the databases while also downloading the required file. The download cost of the proposed DIR scheme increases with the required level of deception d, and achieves the PIR capacity when $d = 0$ .

2. Problem Formulation and System Model

We consider N non-colluding databases storing K independent files, each consisting of L uniformly distributed symbols from a finite field $F_{q}$ , i.e.,

\begin{matrix} H (W_{1}, \dots, W_{K}) = \sum_{i = 1}^{K} H (W_{i}) = K L, \end{matrix}

(1)

where $W_{i}$ is the ith file. The files keep updating from time to time, and a given user wants to download an arbitrary file at arbitrary time instances $T_{i}$ , $i \in N$ . We assume that all files are equally probable to be requested by the user.

The user sends queries at arbitrary time instances to download the required file while deceiving the databases. We assume that the databases are unaware of being deceived, which is fundamental to the concept of deception. Moreover, we assume that the databases are only able to store data (files, queries from users, time stamps of received queries, etc.) corresponding to the current time instance, and that the file updates at distinct time instances are mutually independent. Therefore, the user’s file requirements and the queries sent are independent of the stored files at all time instances, i.e.,

\begin{matrix} I (θ^{[t]}, Q_{n}^{[t]}; W_{1 : K}^{[t]}) = 0, n \in {1, \dots, N}, \forall t, \end{matrix}

(2)

where $θ^{[t]}$ is the user’s file requirement, $Q_{n}^{[t]}$ is the query sent by the user to database n, and $W_{1 : K}^{[t]}$ is the set of K files, all at times t (The notation $1 : K$ indicates all integers from 1 to K). At any given time t when each database n, $n \in {1, \dots, N}$ , receives a query from the user, it sends the corresponding answer as a function of the received query and the stored files; thus,

\begin{matrix} H (A_{n}^{[t]} | Q_{n}^{[t]}, W_{1 : K}^{[t]}) = 0, n \in {1, \dots, N}, \end{matrix}

(3)

where $A_{n}^{[t]}$ is the answer received by the user from database n at time t. At each time $T_{i}$ , $i \in N$ , the user must be able to correctly decode the required file, that is,

\begin{matrix} H (W_{θ^{[T_{i}]}} | Q_{1 : N}^{[T_{i}]}, A_{1 : N}^{[T_{i}]}) = 0, i \in N . \end{matrix}

(4)

At any given time t when each database n, $n \in {1, \dots, N}$ , receives a query from the user, it makes a prediction on the user-required file index using the maximum a posteriori probability (MAP) estimate as follows,

\begin{matrix} {\hat{θ}}_{\tilde{Q}}^{[t]} = arg max_{i} P (θ^{[t]} = i | Q_{n}^{[t]} = \tilde{Q}), n \in {1, \dots, N}, \end{matrix}

(5)

where ${\hat{θ}}_{\tilde{Q}}^{[t]}$ is the predicted user-required file index based on the realization of the received query $\tilde{Q}$ at time t. The probability of error of each database’s prediction is defined as

\begin{matrix} P_{e} = E [P ({\hat{θ}}_{\tilde{Q}}^{[T_{i}]} \neq θ^{[T_{i}]})], \end{matrix}

(6)

where the expectation is taken across all $\tilde{Q}$ and $T_{i}$ . Note that in PIR, $P (θ_{\tilde{Q}}^{[t]} = i | Q_{n}^{[t]} = \tilde{Q}) = P (θ_{\tilde{Q}}^{[t]} = j | Q_{n}^{[t]} = \tilde{Q})$ for all $i, j \in {1, \dots, N}$ , any ${\tilde{Q}}^{[t]}$ , which results in $P_{e}^{PIR} = 1 - \frac{1}{K}$ . Based on this information, we define the metric of deception as

\begin{matrix} D = P_{e} - (1 - \frac{1}{K}) . \end{matrix}

(7)

For PIR, the amount of deception is $D = 0$ , and for weakly PIR, where some amount of information is leaked on the user-required file index, the amount of deception takes a negative value, as the probability of error is smaller than $1 - \frac{1}{K}$ . The goal of this work is to generate schemes that meet a given level of deception $D = d > 0$ , while minimizing the normalized download cost, defined as

\begin{matrix} D_{L} = \frac{H (A_{1 : N})}{L}, \end{matrix}

(8)

where $A_{1 : N}$ represents all the answers received by all N databases, corresponding to a single file requirement of the user. The DIR rate is defined as the reciprocal of $D_{L}$ .

3. Main Result

In this section, we present the main result of this paper, along with some remarks. Consider a system of N non-colluding databases containing K identical files. A user is able to retrieve any file k, while deceiving the databases by leaking information about some other file $k^{'}$ to the databases.

Theorem 1.

Consider a system of N non-colluding databases storing K independent files. A required level of deception d, satisfying $0 \leq d < \frac{(K - 1) (N - 1)}{K (N^{K} - N)}$ , is achievable at a DIR rate

$\begin{matrix} R = {(\frac{1 + (\frac{N^{K} - N}{N - 1}) e^{ϵ}}{1 + (N^{K - 1} - 1) e^{ϵ}} + (\frac{N}{N - 1}) (2 u - u (u + 1) α))}^{- 1}, \end{matrix}$ (9)

where

$\begin{matrix} ϵ = ln (\frac{d K N + (K - 1) (N - 1)}{d K N + (K - 1) (N - 1) - d K N^{K}}), α = \frac{N + (N^{K} - N) e^{ϵ}}{(N - 1) e^{2 ϵ} + (N^{K} - N) e^{ϵ} + 1}, u = ⌊ \frac{1}{α} ⌋ \end{matrix}$ (10)

Remark 1.

For given N and K, $ϵ \geq 0$ is a one-to-one continuous function of d, the required level of deception, and $α \in (0, 1]$ is a one-to-one continuous function of ϵ. For a given $u \in Z^{+}$ , there exists a range of values of α, specified by $\frac{1}{u + 1} < α \leq \frac{1}{u}$ , which corresponds to a unique range of values of ϵ, for which (9) is valid. Since $(0, 1] = \cup {α : \frac{1}{u + 1} < α \leq \frac{1}{u}, u \in Z^{+}}$ , there exists an achievable rate (as well as an achievable scheme) for any $ϵ \geq 0$ as well as for any d in the range $0 \leq d < \frac{(K - 1) (N - 1)}{K (N^{K} - N)}$ .

Remark 2.

When the user-specified amount of deception is zero, i.e., $d = 0$ , the corresponding values of α and u are $α = 1$ and $u = 1$ . The achievable rate for this case is $\frac{1 - \frac{1}{N}}{1 - \frac{1}{N^{K}}}$ , which is equal to the PIR capacity.

Remark 3.

The achievable DIR rate monotonically decreases with increasing amount of deception d for any given N and K.

Remark 4.

The variation in the achievable DIR rate with the level of deception for different numbers of databases when the number of files fixed at $K = 3$ is shown in Figure 2. The achievable rate for different numbers of files when the number of databases is fixed at $N = 2$ is shown in Figure 3. For any given N and K, the rate decreases exponentially when the level of deception is close to the respective upper bound, i.e., $d < \frac{(K - 1) (N - 1)}{K (N^{K} - N)}$ .

Achievable DIR rate for varying levels of deception and different numbers of databases when $K = 3$ .

Achievable DIR rate for varying levels of deception and different numbers of files when $N = 2$ .

4. DIR Scheme

The DIR scheme introduced in this section is designed for a system of N non-colluding databases containing K independent files, with a pre-determined amount of deception $d > 0$ required. For each file requirement at time $T_{i}$ , $i \in N$ , the user chooses a set of $M + 1$ queries to be sent to database n, $n \in {1, \dots, N}$ , at time $T_{i}$ as well as at future time instances $t_{i, j}$ , $j \in {1, \dots, M}$ , such that each $t_{i, j} > T_{i}$ . The query sent at time $T_{i}$ is used to download the required file, while the rest of the M queries are sent to deceive the databases. The queries sent at times $T_{i}$ , $i \in N$ and $t_{i, j}$ , $j \in {1, \dots, M}$ , $i \in N$ are known as real and dummy queries, respectively. The binary random variable R is used to specify whether a query sent by the user is real or dummy, i.e., $R = 1$ corresponds to a real query sent at time $T_{i}$ , and $R = 0$ corresponds to a dummy query sent at time $t_{i, j}$ . Next, we define another classification of queries used in the proposed scheme.

Definition 1

( $ϵ$ -deceptive query). An ϵ-deceptive query $\tilde{Q}$ with respect to file k is defined as a query that always satisfies

$\begin{matrix} \frac{P (Q_{n} = \tilde{Q} | θ = k, R = 1)}{P (Q_{n} = \tilde{Q} | θ = ℓ, R = 1)} = e^{- ϵ}, \frac{P (θ = k | Q_{n} = \tilde{Q})}{P (θ = ℓ | Q_{n} = \tilde{Q})} = e^{ϵ}, \forall ℓ \in {1, \dots, K}, ℓ \neq k, \end{matrix}$ (11)

for some $ϵ > 0$ , where $Q_{n}$ and θ are the random variables representing a query sent to database n, $n \in {1, \dots, N}$ , and the user-required file index. An equivalent representation of (11) is given by

$\begin{matrix} \frac{P (R = 1 | θ = ℓ) + \frac{P (Q_{n} = \tilde{Q} | θ = ℓ, R = 0)}{P (Q_{n} = \tilde{Q} | θ = ℓ, R = 1)} P (R = 0 | θ = ℓ)}{P (R = 1 | θ = k) + \frac{P (Q_{n} = \tilde{Q} | θ = k, R = 0)}{P (Q_{n} = \tilde{Q} | θ = k, R = 1)} P (R = 0 | θ = k)} = e^{- 2 ϵ}, \forall ℓ \in {1, \dots, K}, ℓ \neq k . \end{matrix}$ (12)

Definition 2

(PIR query). A query $\tilde{Q}$ that satisfies (11) with $ϵ = 0$ for all $k \in {1, \dots, K}$ , i.e., a 0-deceptive query, is known as a PIR query.

Remark 5.

The intuition behind the definition of an ϵ-deceptive query with respect to message k in Definition 1 is as follows. Note that the second equation in (11) fixes the databases’ prediction on the user’s requirement as $W_{k}$ for the query $\tilde{Q}$ . This is because the a posteriori probability corresponding to message k, when $\tilde{Q}$ is received by the databases, is greater than that of any other message ℓ, $ℓ \neq k$ . However, the first equation in (11), which is satisfied at the same time, ensures that the user sends the query $\tilde{Q}$ with the least probability when the user needs to download message k, compared to the probabilities of sending $\tilde{Q}$ for other message requirements. In other words, since we assume equal priors, the query $\tilde{Q}$ is mostly sent when the user needs to download $W_{ℓ}$ for $ℓ \neq k$ , and is rarely sent to download $W_{k}$ , while the databases’ prediction on the user-required message upon receiving query $\tilde{Q}$ is fixed at $W_{k}$ , which is incorrect with high probability, hence, the deception.

At a given time t, there exists a set of queries consisting of both deceptive and PIR queries, sent to the N databases. Database n, $n \in {1, \dots, N}$ , is aware of the probability of receiving each query, for each file requirement, i.e., $P (Q_{n} = \tilde{Q} | θ = k)$ , for $k \in {1, \dots, K}$ , $\tilde{Q} \in Q$ , where $Q$ is the set of all queries. However, the databases are unaware of being deceived, and are unable to determine whether the received query $\tilde{Q}$ is real or dummy or deceptive or PIR. The proposed scheme generates a list of real and dummy queries for a given N and K along with the probabilities of using them as $ϵ$ -deceptive and PIR queries, based on the required level of deception d. The scheme also characterizes the optimum number of dummy queries M to be sent to the databases for each file requirement, to minimize the download cost. As an illustration of the proposed scheme, consider the following representative examples.

4.1. Example 1: Two Databases and Two Files, $N = K = 2$

In this example, we present how the proposed DIR scheme is applied in a system of two databases containing two files each. In the proposed scheme, the user generates $M + 1$ queries for any given file requirement which consists of one real query and M dummy queries. The user sends the real query at the time of the requirement $T_{i}$ , and the rest of the M dummy queries at M different future time instances $t_{i, j}$ . Table 1 and Table 2 give possible pairs of real queries that are sent to the two databases to retrieve $W_{1}$ and $W_{2}$ , respectively, at time $T_{i}$ , $i \in N$ . The probability of using each pair of queries is indicated in the first columns of Table 1 and Table 2. Note that the correctness condition in (4) is satisfied at each time $T_{i}$ , as each row of Table 1 and Table 2 decodes files $W_{1}$ and $W_{2}$ , respectively, with no error.

Table 1.

Real query table— $W_{1}$ .

$P (Q \| θ = 1, R = 1)$	DB 1	DB 2
p	$W_{1}$	$ϕ$
p	$ϕ$	$W_{1}$
$p^{'}$	$W_{2}$	$W_{1} + W_{2}$
$p^{'}$	$W_{1} + W_{2}$	$W_{2}$

Open in a new tab

Table 2.

Real query table— $W_{2}$ .

$P (Q \| θ = 2, R = 1)$	DB 1	DB 2
p	$W_{2}$	$ϕ$
p	$ϕ$	$W_{2}$
$p^{'}$	$W_{1}$	$W_{1} + W_{2}$
$p^{'}$	$W_{1} + W_{2}$	$W_{1}$

Open in a new tab

The dummy queries sent to each database at time $t_{i, j}$ are given in Table 3 and Table 4. The purpose of the dummy queries sent at future time instances is to deceive the databases by manipulating the a posteriori probabilities, which impact their predictions. For example, if the user wants to download $W_{1}$ at time $T_{i}$ , the user selects one of the four query options in Table 1 based on the probabilities in column 1 (The values of p and $p^{'}$ are derived later in this section), and sends the corresponding queries to databases 1 and 2 at time $T_{i}$ . Based on the information in Table 3, the user sends the query $W_{1}$ to both databases at M distinct future time instances $t_{i, j}$ , $j \in {1, \dots, M}$ .

Table 3.

Dummy query table— $W_{1}$ .

$P (Q \| θ = 1, R = 0)$	DB 1	DB 2
1	$W_{1}$	$W_{1}$

Open in a new tab

Table 4.

Dummy query table— $W_{2}$ .

$P (Q \| θ = 2, R = 0)$	DB 1	DB 2
1	$W_{2}$	$W_{2}$

Open in a new tab

Based on the information in Table 1, Table 2, Table 3 and Table 4, when the user-required file is $W_{1}$ , the probability of each query being received by database n, $n \in {1, 2}$ , at an arbitrary time instance t is calculated as follows. Let $P (R = 1 | θ = i) = α$ for $i \in {1, 2}$ . (The intuition behind $P (R = 1 | θ = i)$ is the probability of a query received by any database being real when the user-required file index is i. For a fixed M, $P (R = 1 | θ = i) = \frac{1}{M + 1}$ ). Then,

\begin{matrix} P (Q_{n} = W_{1} | θ = 1) & = P (Q_{n} = W_{1} | θ = 1, R = 1) P (R = 1 | θ = 1) \end{matrix}

\begin{matrix} + P (Q_{n} = W_{1} | θ = 1, R = 0) P (R = 0 | θ = 1) \end{matrix}

(13)

\begin{matrix} = p α + 1 - α \end{matrix}

(14)

\begin{matrix} P (Q_{n} = W_{2} | θ = 1) & = P (Q_{n} = W_{2} | θ = 1, R = 1) P (R = 1 | θ = 1) \end{matrix}

\begin{matrix} + P (Q_{n} = W_{2} | θ = 1, R = 0) P (R = 0 | θ = 1) \end{matrix}

(15)

\begin{matrix} = p^{'} α \end{matrix}

(16)

\begin{matrix} P (Q_{n} = W_{1} + W_{2} | θ = 1) & = P (Q_{n} = W_{1} + W_{2} | θ = 1, R = 1) P (R = 1 | θ = 1) \end{matrix}

\begin{matrix} + P (Q_{n} = W_{1} + W_{2} | θ = 1, R = 0) P (R = 0 | θ = 1) \end{matrix}

(17)

\begin{matrix} = p^{'} α \end{matrix}

(18)

\begin{array}{l} P (Q_{n} = ϕ | θ = 1) & = P (Q_{n} = ϕ | θ = 1, R = 1) P (R = 1 | θ = 1) \end{array}

\begin{matrix} + P (Q_{n} = ϕ | θ = 1, R = 0) P (R = 0 | θ = 1) \end{matrix}

(19)

\begin{matrix} = p α \end{matrix}

(20)

Thus, writing these probabilities compactly, we have

\begin{matrix} P (Q_{n} = W_{1} | θ = 1) & = p α + 1 - α \end{matrix}

(21)

\begin{matrix} P (Q_{n} = W_{2} | θ = 1) & = p^{'} α \end{matrix}

(22)

\begin{matrix} P (Q_{n} = W_{1} + W_{2} | θ = 1) & = p^{'} α \end{matrix}

(23)

\begin{matrix} P (Q_{n} = ϕ | θ = 1) & = p α . \end{matrix}

(24)

Similarly, when the user-required file is $W_{2}$ , the corresponding probabilities are

\begin{matrix} P (Q_{n} = W_{1} | θ = 2) & = p^{'} α \end{matrix}

(25)

\begin{matrix} P (Q_{n} = W_{2} | θ = 2) & = p α + 1 - α \end{matrix}

(26)

\begin{matrix} P (Q_{n} = W_{1} + W_{2} | θ = 2) & = p^{'} α \end{matrix}

(27)

\begin{matrix} P (Q_{n} = ϕ | θ = 2) & = p α . \end{matrix}

(28)

These queries and the corresponding probabilities of sending them to each database for each message requirement are known to the databases. However, the decomposition of these probabilities based on whether the query is real or dummy, i.e., Table 1, Table 2, Table 3 and Table 4, is not known by the databases. When database n, $n \in {1, \dots, N}$ , receives a query $\tilde{Q}$ at time t, it calculates the a posteriori probability distribution of the user-required file index, to predict the user’s requirement using (5). The a posteriori probabilities corresponding to the four queries received by database n, $n \in {1, 2}$ , are calculated as follows:

\begin{matrix} P (θ = i | Q_{n} = \tilde{Q}) & = \frac{P (Q_{n} = \tilde{Q} | θ = i) P (θ = i)}{P (Q_{n} = \tilde{Q})} . \end{matrix}

(29)

Then, the explicit a posteriori probabilities are given by

\begin{matrix} P (θ = 1 | Q_{n} = W_{1}) & = \frac{\frac{1}{2} (p α + 1 - α)}{P (Q_{n} = W_{1})} \end{matrix}

(30)

\begin{matrix} P (θ = 2 | Q_{n} = W_{1}) & = \frac{\frac{1}{2} p^{'} α}{P (Q_{n} = W_{1})} \end{matrix}

(31)

\begin{matrix} P (θ = 1 | Q_{n} = W_{2}) & = \frac{\frac{1}{2} p^{'} α}{P (Q_{n} = W_{2})} \end{matrix}

(32)

\begin{matrix} P (θ = 2 | Q_{n} = W_{2}) & = \frac{\frac{1}{2} (p α + 1 - α)}{P (Q_{n} = W_{2})} \end{matrix}

(33)

\begin{matrix} P (θ = 1 | Q_{n} = W_{1} + W_{2}) & = \frac{\frac{1}{2} p^{'} α}{P (Q_{n} = W_{1} + W_{2})} \end{matrix}

(34)

\begin{matrix} P (θ = 2 | Q_{n} = W_{1} + W_{2}) & = \frac{\frac{1}{2} p^{'} α}{P (Q_{n} = W_{1} + W_{2})} \end{matrix}

(35)

\begin{matrix} P (θ = 1 | Q_{n} = ϕ) & = \frac{\frac{1}{2} p α}{P (Q_{n} = ϕ)} \end{matrix}

(36)

\begin{matrix} P (θ = 2 | Q_{n} = ϕ) & = \frac{\frac{1}{2} p α}{P (Q_{n} = ϕ)} . \end{matrix}

(37)

While queries $ϕ$ and $W_{1} + W_{2}$ are PIR queries as stated in Definition 2, queries $W_{1}$ and $W_{2}$ are $ϵ$ -deceptive with respect to file indices 1 and 2, respectively, for an $ϵ$ that depends on the required amount of deception d. The values of p and $p^{'}$ in Table 1, Table 2, Table 3 and Table 4 are calculated based on the requirements in Definition 1 as follows. It is straightforward to see that $p^{'} = p e^{ϵ}$ follows from the first part of (11) for each query $\tilde{Q} = W_{1}$ and $\tilde{Q} = W_{2}$ , which also gives $p = \frac{1}{2 (1 + e^{ϵ})}$ . The second part of (11) (as well as (12)) results in $α = \frac{2}{1 + e^{ϵ}}$ for both $ϵ$ -deceptive queries $W_{1}$ and $W_{2}$ . Based on the a posteriori probabilities (30)–(37) calculated by the databases using the information in (21)–(28), each database predicts the user’s requirement at each time it receives a query from the user. The predictions corresponding to each query received by database n, $n = 1, 2$ , which are computed using (5), are shown in Table 5.

Table 5.

Probabilities of each database predicting the user-required file in Example 1.

Query $\tilde{Q}$	$P ({\hat{θ}}_{\tilde{Q}} = 1)$	$P ({\hat{θ}}_{\tilde{Q}} = 2)$
$W_{1}$	1	0
$W_{2}$	0	1
$W_{1} + W_{2}$	$\frac{1}{2}$	$\frac{1}{2}$
$ϕ$	$\frac{1}{2}$	$\frac{1}{2}$

Open in a new tab

Based on this information, when a database receives query $Q = W_{1}$ , it always decides that the requested message is $W_{1}$ , and when it receives query $Q = W_{2}$ , it always decides that the requested message is $W_{2}$ . For queries $Q = ϕ$ and $Q = W_{1} + W_{2}$ , the databases flip a coin to choose either $W_{1}$ or $W_{2}$ as the requested message.

As the queries are symmetric across all databases, the probability of error corresponding to some query $\tilde{Q}$ received by database n at time $T_{i}$ is given by

\begin{matrix} P ({\hat{θ}}_{\tilde{Q}}^{[T_{i}]} \neq θ^{[T_{i}]}) \end{matrix}

\begin{matrix} = P (θ^{[T_{i}]} = 1, {\hat{θ}}_{\tilde{Q}}^{[T_{i}]} = 2 | Q_{n}^{[T_{i}]} = \tilde{Q}) + P (θ^{[T_{i}]} = 2, {\hat{θ}}_{\tilde{Q}}^{[T_{i}]} = 1 | Q_{n}^{[T_{i}]} = \tilde{Q}) \end{matrix}

(38)

\begin{matrix} = \frac{1}{P (Q_{n}^{[T_{i}]} = \tilde{Q})} (P ({\hat{θ}}_{\tilde{Q}}^{[T_{i}]} = 2 | θ^{[T_{i}]} = 1, Q_{n}^{[T_{i}]} = \tilde{Q}) P (Q_{n}^{[T_{i}]} = \tilde{Q} | θ^{[T_{i}]} = 1) P (θ^{[T_{i}]} = 1) \end{matrix}

\begin{matrix} + P ({\hat{θ}}_{\tilde{Q}}^{[T_{i}]} = 1 | θ^{[T_{i}]} = 2, Q_{n}^{[T_{i}]} = \tilde{Q}) P (Q_{n}^{[T_{i}]} = \tilde{Q} | θ^{[T_{i}]} = 2) P (θ^{[T_{i}]} = 2)) \end{matrix}

(39)

\begin{matrix} = \frac{1}{P (Q_{n}^{[T_{i}]} = \tilde{Q})} (P ({\hat{θ}}_{\tilde{Q}}^{[T_{i}]} = 2 | Q_{n}^{[T_{i}]} = \tilde{Q}) P (Q_{n}^{[T_{i}]} = \tilde{Q} | θ^{[T_{i}]} = 1) P (θ^{[T_{i}]} = 1) \end{matrix}

\begin{matrix} + P ({\hat{θ}}_{\tilde{Q}} = 1 | Q_{n}^{[T_{i}]} = \tilde{Q}) P (Q_{n}^{[T_{i}]} = \tilde{Q} | θ^{[T_{i}]} = 2) P (θ^{[T_{i}]} = 2)), \end{matrix}

(40)

as the predictions only depend on the received queries. The explicit probabilities corresponding to the four queries are (Note that $P (Q_{n} = \tilde{Q} | θ^{[T_{i}]} = i)$ implies $P (Q_{n} = \tilde{Q} | θ = i, R = 1)$ , as only real queries are sent at time $T_{i}$ ).

\begin{matrix} P ({\hat{θ}}_{W_{1}}^{[T_{i}]} \neq θ^{[T_{i}]}) & = \frac{1}{P (Q_{n}^{[T_{i}]} = W_{1})} \frac{e^{ϵ}}{4 (1 + e^{ϵ})} \end{matrix}

(41)

\begin{matrix} P ({\hat{θ}}_{W_{2}}^{[T_{i}]} \neq θ^{[T_{i}]}) & = \frac{1}{P (Q_{n}^{[T_{i}]} = W_{2})} \frac{e^{ϵ}}{4 (1 + e^{ϵ})} \end{matrix}

(42)

\begin{matrix} P ({\hat{θ}}_{W_{1} + W_{2}}^{[T_{i}]} \neq θ^{[T_{i}]}) & = \frac{1}{P (Q_{n}^{[T_{i}]} = W_{1} + W_{2})} \frac{e^{ϵ}}{4 (1 + e^{ϵ})} \end{matrix}

(43)

\begin{matrix} P ({\hat{θ}}_{ϕ}^{[T_{i}]} \neq θ^{[T_{i}]}) & = \frac{1}{P (Q_{n}^{[T_{i}]} = ϕ)} \frac{1}{4 (1 + e^{ϵ})} . \end{matrix}

(44)

As the same scheme is used for all user requirements at all time instances, the probability of error of each database’s prediction for this example is calculated using (6) as

\begin{matrix} P_{e} & = \sum_{\tilde{Q} \in Q} P (Q_{n}^{[T_{i}]} = \tilde{Q}) P ({\hat{θ}}_{\tilde{Q}}^{[T_{i}]} \neq θ^{[T_{i}]}) \end{matrix}

(45)

\begin{matrix} = \frac{3 e^{ϵ} + 1}{4 (1 + e^{ϵ})} \end{matrix}

(46)

where $Q = {W_{1}, W_{2}, W_{1} + W_{2}, ϕ}$ , which results in a deception of $D = \frac{3 e^{ϵ} + 1}{4 (1 + e^{ϵ})} - \frac{1}{2} = \frac{e^{ϵ} - 1}{4 (1 + e^{ϵ})}$ . Therefore, for a required amount of deception $d < \frac{1}{4}$ , the value of $ϵ$ is chosen as $ϵ = ln (\frac{4 d + 1}{1 - 4 d})$ .

The download cost of this scheme is computed as follows. As the scheme is symmetric across all file retrievals, and since the a priori probability distribution of the files is uniform, without loss of generality, we can calculate the download cost of retrieving $W_{1}$ . The download cost of retrieving $W_{1}$ for a user specified amount of deception d is given by

\begin{matrix} D_{L} & = \frac{1}{L} (2 L p + 2 (2 L) p e^{ϵ} + 2 L \sum_{m = 0}^{\infty} p_{m} m) \end{matrix}

(47)

\begin{matrix} = \frac{1 + 2 e^{ϵ}}{1 + e^{ϵ}} + 2 E [M] \end{matrix}

(48)

where $p_{m}$ is the probability of sending m dummy queries per each file requirement. To minimize the download cost, we need to find the probability mass function (PMF) of M which minimizes $E [M]$ such that $P (R = 1 | θ = i) = α = \frac{2}{1 + e^{ϵ}}$ is satisfied for any i. Note that for any i, $P (R = 1 | θ = i)$ can be written as

\begin{matrix} P (R = 1 | θ = i) = α = \sum_{m = 0}^{\infty} p_{m} \frac{1}{m + 1} = E [\frac{1}{M + 1}], \end{matrix}

(49)

where M is the random variable representing the number of dummy queries sent to each database per file requirement. Thus, the following optimization problem needs to be solved, for a given $ϵ$ , that is a function of the given value of d:

\begin{matrix} min & E [M] \\ s . t . & E [\frac{1}{M + 1}] = \frac{2}{1 + e^{ϵ}} = α . \end{matrix}

(50)

The solution to this problem is given in Lemma 1, and the resulting minimum download cost is given by

\begin{matrix} D_{L} & = \frac{1 + 2 e^{ϵ}}{1 + e^{ϵ}} + 4 u - 2 u (u + 1) α, \end{matrix}

(51)

where $u = ⌊ \frac{1}{α} ⌋$ . When $d = 0$ , it follows that $ϵ = 0$ and $u = 1$ , and the achievable rate is $\frac{2}{3}$ , which is the same as the PIR capacity for $N = 2$ and $K = 2$ .

4.2. Example 2: Three Databases and Three Files, $N = K = 3$

Similar to the previous example, the user sends real queries at time $T_{i}$ and dummy queries at times $t_{i, j}$ , $j \in {1, \dots, M}$ , for each $i \in N$ , based on the probabilities shown in Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11. The notation $W_{i}^{j}$ in these tables corresponds to the jth segment of $W_{i}$ , where each file $W_{i}$ is divided into $N - 1 = 2$ segments of equal size. Database n, $n \in {1, \dots, N}$ , only knows the overall probabilities of receiving each query for each file requirement of the user shown in Table 12. These overall probabilities, which are calculated using

\begin{matrix} P (Q_{n} = \tilde{Q} | θ = k) & = P (Q_{n} = \tilde{Q} | θ = k, R = 1) P (R = 1 | θ = k) \\ + P (Q_{n} = \tilde{Q} | θ = k, R = 0) P (R = 0 | θ = k), k \in {1, \dots, K} \end{matrix}

(52)

where $P (R = 1 | θ = i) = α$ for any $i = 1, 2, 3$ , are the same for each database, as the scheme is symmetric across all databases. The entry “other queries” in Table 12 includes all queries that have sums of two or three elements. Based on this available information, each database calculates the a posteriori probability of the user-required file index conditioned on each received query $\tilde{Q}$ using (29). Each query of the form $W_{k}^{j}$ is an $ϵ$ -deceptive query with respect to file k, where $ϵ$ is a function of the required amount of deception, which is derived towards the end of this section. All other queries including the null query and all sums of two or three elements are PIR queries. As all $ϵ$ -deceptive queries must satisfy (11), the value of $p^{'}$ is given by $p^{'} = p e^{ϵ}$ , which results in $p = \frac{1}{3 (1 + 8 e^{ϵ})}$ , based on the same arguments used in the previous example. Using (11) and (29) for any given deceptive query, the value of $α$ is calculated as follows. Note that for a query of the form $W_{k}^{j}$ , for each database n, $n \in {1, \dots, N}$ , using $P (θ = k) = \frac{1}{K}$ , we have

\begin{matrix} \frac{P (θ = k | Q_{n} = W_{k}^{j})}{P (θ = ℓ | Q = W_{k}^{j})} & = \frac{P (Q_{n} = W_{k}^{j} | θ = k)}{P (Q_{n} = W_{k}^{j} | θ = ℓ)} = \frac{p α + \frac{1}{2} (1 - α)}{p^{'} α}, \end{matrix}

(53)

The value of $α$ is computed as $α = \frac{1}{2 p (e^{2 ϵ} - 1) + 1}$ , using (11) and (53) by solving $\frac{p α + \frac{1}{2} (1 - α)}{p^{'} α} = e^{ϵ}$ .

Table 6.

Real query table— $W_{1}$ .

$P (Q \| θ = 1, R = 1)$	Database 1	Database 2	Database 3
p	$W_{1}^{1}$	$W_{1}^{2}$	$ϕ$
p	$W_{1}^{2}$	$ϕ$	$W_{1}^{1}$
p	$ϕ$	$W_{1}^{1}$	$W_{1}^{2}$
$p^{'}$	$W_{1}^{1} + W_{2}^{1}$	$W_{1}^{2} + W_{2}^{1}$	$W_{2}^{1}$
$p^{'}$	$W_{1}^{2} + W_{2}^{1}$	$W_{2}^{1}$	$W_{1}^{1} + W_{2}^{1}$
$p^{'}$	$W_{2}^{1}$	$W_{1}^{1} + W_{2}^{1}$	$W_{1}^{2} + W_{2}^{1}$
$p^{'}$	$W_{1}^{1} + W_{2}^{2}$	$W_{1}^{2} + W_{2}^{2}$	$W_{2}^{2}$
$p^{'}$	$W_{1}^{2} + W_{2}^{2}$	$W_{2}^{2}$	$W_{1}^{1} + W_{2}^{2}$
$p^{'}$	$W_{2}^{2}$	$W_{1}^{1} + W_{2}^{2}$	$W_{1}^{2} + W_{2}^{2}$
$p^{'}$	$W_{1}^{1} + W_{3}^{1}$	$W_{1}^{2} + W_{3}^{1}$	$W_{3}^{1}$
$p^{'}$	$W_{1}^{2} + W_{3}^{1}$	$W_{3}^{1}$	$W_{1}^{1} + W_{3}^{1}$
$p^{'}$	$W_{3}^{1}$	$W_{1}^{1} + W_{3}^{1}$	$W_{1}^{2} + W_{3}^{1}$
$p^{'}$	$W_{1}^{1} + W_{3}^{2}$	$W_{1}^{2} + W_{3}^{2}$	$W_{3}^{2}$
$p^{'}$	$W_{1}^{2} + W_{3}^{2}$	$W_{3}^{2}$	$W_{1}^{1} + W_{3}^{2}$
$p^{'}$	$W_{3}^{2}$	$W_{1}^{1} + W_{3}^{2}$	$W_{1}^{2} + W_{3}^{2}$
$p^{'}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{1}$	$W_{2}^{1} + W_{3}^{1}$
$p^{'}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{1}$	$W_{2}^{1} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{1}$
$p^{'}$	$W_{2}^{1} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{1}$
$p^{'}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{1}$	$W_{2}^{2} + W_{3}^{1}$
$p^{'}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{1}$	$W_{2}^{2} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{1}$
$p^{'}$	$W_{2}^{2} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{1}$
$p^{'}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{2}$	$W_{2}^{1} + W_{3}^{2}$
$p^{'}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{2}$	$W_{2}^{1} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{2}$
$p^{'}$	$W_{2}^{1} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{2}$
$p^{'}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{2}$	$W_{2}^{2} + W_{3}^{2}$
$p^{'}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{2}$	$W_{2}^{2} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{2}$
$p^{'}$	$W_{2}^{2} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{2}$

Open in a new tab

Table 7.

Dummy query table— $W_{1}$ .

$P (Q \| θ = 1, R = 0)$	DB 1	$P (Q \| θ = 1, R = 0)$	DB 2	$P (Q \| θ = 1, R = 0)$	DB 3
$\frac{1}{2}$	$W_{1}^{1}$	$\frac{1}{2}$	$W_{1}^{1}$	$\frac{1}{2}$	$W_{1}^{1}$
$\frac{1}{2}$	$W_{1}^{2}$	$\frac{1}{2}$	$W_{1}^{2}$	$\frac{1}{2}$	$W_{1}^{2}$

Open in a new tab

Table 8.

Real query table— $W_{2}$ .

$P (Q \| θ = 2, R = 1)$	Database 1	Database 2	Database 3
p	$W_{2}^{1}$	$W_{2}^{2}$	$ϕ$
p	$W_{2}^{2}$	$ϕ$	$W_{2}^{1}$
p	$ϕ$	$W_{2}^{1}$	$W_{2}^{2}$
$p^{'}$	$W_{1}^{1} + W_{2}^{1}$	$W_{1}^{1} + W_{2}^{2}$	$W_{1}^{1}$
$p^{'}$	$W_{1}^{1} + W_{2}^{2}$	$W_{1}^{1}$	$W_{1}^{1} + W_{2}^{1}$
$p^{'}$	$W_{1}^{1}$	$W_{1}^{1} + W_{2}^{1}$	$W_{1}^{1} + W_{2}^{2}$
$p^{'}$	$W_{1}^{2} + W_{2}^{1}$	$W_{1}^{2} + W_{2}^{2}$	$W_{1}^{2}$
$p^{'}$	$W_{1}^{2} + W_{2}^{2}$	$W_{1}^{2}$	$W_{1}^{2} + W_{2}^{1}$
$p^{'}$	$W_{1}^{2}$	$W_{1}^{2} + W_{2}^{1}$	$W_{1}^{2} + W_{2}^{2}$
$p^{'}$	$W_{2}^{1} + W_{3}^{1}$	$W_{2}^{2} + W_{3}^{1}$	$W_{3}^{1}$
$p^{'}$	$W_{2}^{2} + W_{3}^{1}$	$W_{3}^{1}$	$W_{2}^{1} + W_{3}^{1}$
$p^{'}$	$W_{3}^{1}$	$W_{2}^{1} + W_{3}^{1}$	$W_{2}^{2} + W_{3}^{1}$
$p^{'}$	$W_{2}^{1} + W_{3}^{2}$	$W_{2}^{2} + W_{3}^{2}$	$W_{3}^{2}$
$p^{'}$	$W_{2}^{2} + W_{3}^{2}$	$W_{3}^{2}$	$W_{2}^{1} + W_{3}^{2}$
$p^{'}$	$W_{3}^{2}$	$W_{2}^{1} + W_{3}^{2}$	$W_{2}^{2} + W_{3}^{2}$
$p^{'}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{1}$	$W_{1}^{1} + W_{3}^{1}$
$p^{'}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{1}$	$W_{1}^{1} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{1}$
$p^{'}$	$W_{1}^{1} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{1}$
$p^{'}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{2}$	$W_{1}^{1} + W_{3}^{2}$
$p^{'}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{2}$	$W_{1}^{1} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{2}$
$p^{'}$	$W_{1}^{1} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{2}$
$p^{'}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{1}$	$W_{1}^{2} + W_{3}^{1}$
$p^{'}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{1}$	$W_{1}^{2} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{1}$
$p^{'}$	$W_{1}^{2} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{1}$
$p^{'}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{2}$	$W_{1}^{2} + W_{3}^{2}$
$p^{'}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{2}$	$W_{1}^{2} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{2}$
$p^{'}$	$W_{1}^{2} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{2}$

Open in a new tab

Table 9.

Dummy query table— $W_{2}$ .

$P (Q \| θ = 2, R = 0)$	DB 1	$P (Q \| θ = 2, R = 0)$	DB 2	$P (Q \| θ = 2, R = 0)$	DB 3
$\frac{1}{2}$	$W_{2}^{1}$	$\frac{1}{2}$	$W_{2}^{1}$	$\frac{1}{2}$	$W_{2}^{1}$
$\frac{1}{2}$	$W_{2}^{2}$	$\frac{1}{2}$	$W_{2}^{2}$	$\frac{1}{2}$	$W_{2}^{2}$

Open in a new tab

Table 10.

Real query table— $W_{3}$ .

$P (Q \| θ = 3, R = 1)$	Database 1	Database 2	Database 3
p	$W_{3}^{1}$	$W_{3}^{2}$	$ϕ$
p	$W_{3}^{2}$	$ϕ$	$W_{3}^{1}$
p	$ϕ$	$W_{3}^{1}$	$W_{3}^{2}$
$p^{'}$	$W_{1}^{1} + W_{3}^{1}$	$W_{1}^{1} + W_{3}^{2}$	$W_{1}^{1}$
$p^{'}$	$W_{1}^{1} + W_{3}^{2}$	$W_{1}^{1}$	$W_{1}^{1} + W_{3}^{1}$
$p^{'}$	$W_{1}^{1}$	$W_{1}^{1} + W_{3}^{1}$	$W_{1}^{1} + W_{3}^{2}$
$p^{'}$	$W_{1}^{2} + W_{3}^{1}$	$W_{1}^{2} + W_{3}^{2}$	$W_{1}^{2}$
$p^{'}$	$W_{1}^{2} + W_{3}^{2}$	$W_{1}^{2}$	$W_{1}^{2} + W_{3}^{2}$
$p^{'}$	$W_{1}^{2}$	$W_{1}^{2} + W_{3}^{2}$	$W_{1}^{2} + W_{3}^{1}$
$p^{'}$	$W_{2}^{1} + W_{3}^{1}$	$W_{2}^{1} + W_{3}^{2}$	$W_{2}^{1}$
$p^{'}$	$W_{2}^{1} + W_{3}^{2}$	$W_{3}^{1}$	$W_{2}^{1} + W_{3}^{1}$
$p^{'}$	$W_{2}^{1}$	$W_{2}^{1} + W_{3}^{1}$	$W_{2}^{1} + W_{3}^{2}$
$p^{'}$	$W_{2}^{2} + W_{3}^{1}$	$W_{2}^{2} + W_{3}^{2}$	$W_{2}^{2}$
$p^{'}$	$W_{2}^{2} + W_{3}^{2}$	$W_{2}^{2}$	$W_{2}^{2} + W_{3}^{1}$
$p^{'}$	$W_{2}^{2}$	$W_{2}^{2} + W_{3}^{1}$	$W_{2}^{2} + W_{3}^{2}$
$p^{'}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{1}$
$p^{'}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{1}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{1}$
$p^{'}$	$W_{1}^{1} + W_{2}^{1}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{1} + W_{3}^{2}$
$p^{'}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{1}$
$p^{'}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{1}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{1}$
$p^{'}$	$W_{1}^{2} + W_{2}^{1}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{1} + W_{3}^{2}$
$p^{'}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{2}$
$p^{'}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{2}$	$W_{1}^{1} + W_{2}^{2}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{1}$
$p^{'}$	$W_{1}^{1} + W_{2}^{2}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{1}$	$W_{1}^{1} + W_{2}^{2} + W_{3}^{2}$
$p^{'}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{2}$
$p^{'}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{2}$	$W_{1}^{2} + W_{2}^{2}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{1}$
$p^{'}$	$W_{1}^{2} + W_{2}^{2}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{1}$	$W_{1}^{2} + W_{2}^{2} + W_{3}^{2}$

Open in a new tab

Table 11.

Dummy query table— $W_{3}$ .

$P (Q \| θ = 3, R = 0)$	DB 1	$P (Q \| θ = 3, R = 0)$	DB 2	$P (Q \| θ = 3, R = 0)$	DB 3
$\frac{1}{2}$	$W_{3}^{1}$	$\frac{1}{2}$	$W_{3}^{1}$	$\frac{1}{2}$	$W_{3}^{1}$
$\frac{1}{2}$	$W_{3}^{2}$	$\frac{1}{2}$	$W_{3}^{2}$	$\frac{1}{2}$	$W_{3}^{2}$

Open in a new tab

Table 12.

Queries received by database n, $n \in {1, \dots, N}$ , at a given time t for each file requirement, and the corresponding probabilities.

Query $\tilde{Q}$	$P (Q_{n} = \tilde{Q} \| θ = 1)$	$P (Q_{n} = \tilde{Q} \| θ = 2)$	$P (Q_{n} = \tilde{Q} \| θ = 3)$
$ϕ$	$p α$	$p α$	$p α$
$W_{1}^{1}$	$p α + \frac{1}{2} (1 - α)$	$p^{'} α$	$p^{'} α$
$W_{1}^{2}$	$p α + \frac{1}{2} (1 - α)$	$p^{'} α$	$p^{'} α$
$W_{2}^{1}$	$p^{'} α$	$p α + \frac{1}{2} (1 - α)$	$p^{'} α$
$W_{2}^{2}$	$p^{'} α$	$p α + \frac{1}{2} (1 - α)$	$p^{'} α$
$W_{3}^{1}$	$p^{'} α$	$p^{'} α$	$p α + \frac{1}{2} (1 - α)$
$W_{3}^{2}$	$p^{'} α$	$p^{'} α$	$p α + \frac{1}{2} (1 - α)$
other queries	$p^{'} α$	$p^{'} α$	$p^{'} α$

Open in a new tab

Assume that the user wants to download $W_{2}$ at some time $T_{i}$ . Then, at time $T_{i}$ , the user picks a row of queries from Table 8 based on the probabilities in the first column, and sends them to each of the three databases. Note that correctness is satisfied, as it is possible to decode $W_{2}$ from any row of Table 8. Next, the user picks M future time instances $t_{i, j}$ , $j \in {1, \dots, M}$ , and at each time $t_{i, j}$ the user independently and randomly picks a row from Table 9 and sends the queries to the databases. This completes the scheme, and the value of M that minimizes the download cost is calculated at the end of this example.

The databases make predictions with the received query at each time t, based on the information available in Table 12. As the a posteriori probabilities $P (θ = k | Q_{n} = \tilde{Q})$ are proportional to the corresponding probabilities given by $P (Q_{n} = \tilde{Q} | θ = k)$ from (29), the databases’ predictions (using (5)) and the corresponding probabilities are shown in Table 13.

Table 13.

Probabilities of each database predicting the user-required file in Example 2.

Query $\tilde{Q}$	$P ({\hat{θ}}_{\tilde{Q}} = 1)$	$P ({\hat{θ}}_{\tilde{Q}} = 2)$	$P ({\hat{θ}}_{\tilde{Q}} = 3)$
$W_{1}^{1}$	1	0	0
$W_{1}^{2}$	1	0	0
$W_{2}^{1}$	0	1	0
$W_{2}^{2}$	0	1	0
$W_{3}^{1}$	0	0	1
$W_{3}^{2}$	0	0	1
other queries	$\frac{1}{3}$	$\frac{1}{3}$	$\frac{1}{3}$

Open in a new tab

The probability of error for each type of query is calculated as follows. First, consider the $ϵ$ -deceptive queries with respect to file k, given by $W_{k}^{j}$ , $j \in {1, 2}$ . For these queries, the error probability from the perspective of database n, $n \in {1, \dots, N}$ , is given by

\begin{matrix} P ({\hat{θ}}_{W_{k}^{j}}^{[T_{i}]} \neq θ^{[T_{i}]}) & = P (θ^{[T_{i}]} \neq k | Q_{n}^{[T_{i}]} = W_{k}^{j}) \end{matrix}

(54)

\begin{matrix} = \sum_{ℓ = 1, ℓ \neq k}^{3} P (θ^{[T_{i}]} = ℓ | Q_{n}^{[T_{i}]} = W_{k}^{j}) \end{matrix}

(55)

\begin{matrix} = \frac{\sum_{ℓ = 1, ℓ \neq k}^{3} P (Q_{n}^{[T_{i}]} = W_{k}^{j} | θ^{[T_{i}]} = ℓ) P (θ^{[T_{i}]} = ℓ)}{P (Q_{n}^{[T_{i}]} = W_{k}^{j})} \end{matrix}

(56)

\begin{matrix} = \frac{1}{P (Q_{n}^{[T_{i}]} = W_{k}^{j})} \frac{2}{3} p e^{ϵ}, \end{matrix}

(57)

where (54) follows from the fact that the databases’ prediction on a received query of the form $W_{k}^{j}$ is file k with probability 1 from Table 13, and the probabilities in (57) are obtained from real query tables as they correspond to queries sent at time $T_{i}$ . Next, the probability of error corresponding to each of the the other queries, i.e., PIR queries that include the null query and sums of two or three elements, is given by

\begin{matrix} P ({\hat{θ}}_{\tilde{Q}}^{[T_{i}]} \neq θ^{[T_{i}]}) & = P ({\hat{θ}}^{[T_{i}]} \neq θ^{[T_{i}]} | Q_{n}^{[T_{i}]} = \tilde{Q}) \end{matrix}

(58)

\begin{matrix} = \frac{\sum_{j = 1}^{3} \sum_{m = 1, m \neq j}^{3} P ({\hat{θ}}^{[T_{i}]} = m, θ^{[T_{i}]} = j, Q_{n}^{[T_{i}]} = \tilde{Q})}{P (Q_{n}^{[T_{i}]} = \tilde{Q})} \end{matrix}

(59)

\begin{matrix} = \frac{\sum_{j = 1}^{3} \sum_{m = 1, m \neq j}^{3} P ({\hat{θ}}^{[T_{i}]} = m | θ^{[T_{i}]} = j, Q_{n}^{[T_{i}]} = \tilde{Q}) P (Q_{n}^{[T_{i}]} = \tilde{Q} | θ^{[T_{i}]} = j) P (θ^{[T_{i}]} = j)}{P (Q_{n}^{[T_{i}]} = \tilde{Q})} \end{matrix}

(60)

\begin{matrix} = \frac{1}{P (Q_{n}^{[T_{i}]} = \tilde{Q})} \{\begin{matrix} \frac{2 p}{3}, & if \tilde{Q} = ϕ \\ \frac{2 p e^{ϵ}}{3}, & if \tilde{Q} if of the form \sum_{s = 1}^{ℓ} W_{k_{s}}^{j_{s}} for ℓ \in {2, 3} \end{matrix} \end{matrix}

(61)

where (61) follows from the fact that ${\hat{θ}}^{[T_{i}]}$ is conditionally independent of $θ^{[T_{i}]}$ given $Q_{n}$ , from (5). The probability of error at each time $T_{i}$ , $i \in N$ , is the same, as the scheme is identical at each $T_{i}$ , and across all file requirements. Therefore, the probability of error of each database’s prediction using (6) is given by

\begin{matrix} P_{e} & = P ({\hat{θ}}^{[T_{i}]} \neq θ^{[T_{i}]}) \end{matrix}

(62)

\begin{matrix} = \sum_{\tilde{Q} \in Q} P (Q_{n} = \tilde{Q}) P ({\hat{θ}}_{\tilde{Q}}^{[T_{i}]} \neq θ^{[T_{i}]}) \end{matrix}

(63)

\begin{matrix} = \sum_{k = 1}^{3} \sum_{j = 1}^{2} P (Q_{n} = W_{k}^{j}) \frac{1}{P (Q_{n}^{[T_{i}]} = W_{k}^{j})} \frac{2}{3} p e^{ϵ} + P (Q_{n} = ϕ) \frac{1}{P (Q_{n} = ϕ)} \frac{2 p}{3} \end{matrix}

\begin{matrix} + 20 P (Q_{n} = \hat{Q}) \frac{1}{P (Q_{n} = \hat{Q})} \frac{2 p e^{ϵ}}{3} \end{matrix}

(64)

\begin{matrix} = 4 p e^{ϵ} + \frac{2 p}{3} + \frac{40 p e^{ϵ}}{3} \end{matrix}

(65)

\begin{matrix} = \frac{52 e^{ϵ} + 2}{9 (8 e^{ϵ} + 1)} . \end{matrix}

(66)

where $Q$ is the set of all queries and $\hat{Q}$ is a query of the form $\sum_{s = 1}^{ℓ} W_{k_{s}}^{j_{s}}$ for $ℓ \in {2, 3}$ . The resulting amount of deception is,

\begin{matrix} D & = P_{e} - (1 - \frac{1}{K}) = \frac{52 e^{ϵ} + 2}{9 (8 e^{ϵ} + 1)} - \frac{2}{3} = \frac{4 (e^{ϵ} - 1)}{9 (8 e^{ϵ} + 1)} . \end{matrix}

(67)

Therefore, for a required amount of deception $d < \frac{1}{18}$ , $ϵ$ is chosen as $ϵ = ln (\frac{9 d + 4}{4 (1 - 18 d)})$ .

Without loss of generality, consider the cost of downloading $W_{1}$ , which is the same as the expected download cost, as the scheme is symmetric across all file retrievals.

\begin{matrix} D_{L} & = \frac{1}{L} (L \times 3 p + \frac{3 L}{2} \times 24 p e^{ϵ} + \frac{3 L}{2} \sum_{m = 0}^{\infty} p_{m} m) = \frac{1 + 12 e^{ϵ}}{1 + 8 e^{ϵ}} + \frac{3}{2} E [M] \end{matrix}

(68)

To find the scheme that achieves the minimum $D_{L}$ , we need to find the minimum $E [M]$ that satisfies $P (R = 1 | θ = i) = α = E [\frac{1}{M + 1}] = \frac{3 (1 + 8 e^{ϵ})}{2 e^{2 ϵ} + 24 e^{ϵ} + 1}$ , i.e., the following optimization problem needs to be solved.

\begin{matrix} min & E [M] \\ s . t . & E [\frac{1}{M + 1}] = \frac{3 e^{- 2 ϵ} (1 + 8 e^{ϵ})}{2 + e^{- 2 ϵ} + 24 e^{- ϵ}} . \end{matrix}

(69)

The solution to this problem is given in Lemma 1. The resulting minimum download cost for a given value of $ϵ$ , i.e., required level of deception d, is given by

\begin{matrix} \frac{D_{ϵ}}{L} & = \frac{1 + 12 e^{ϵ}}{1 + 8 e^{ϵ}} + \frac{3}{2} (2 u - u (u + 1) α), α = \frac{3 e^{- 2 ϵ} (1 + 8 e^{ϵ})}{2 + e^{- 2 ϵ} + 24 e^{- ϵ}}, \end{matrix}

(70)

where $u = ⌊ \frac{1}{α} ⌋$ . When $d = 0$ , it follows that $ϵ = 0$ , $α = 1$ , and $u = 1$ , and the achievable rate is $\frac{9}{13}$ , which is equal to the PIR capacity for the case $N = 3, K = 3$ .

4.3. Generalized DIR Scheme for Arbitrary N and K

In the general DIR scheme proposed in this work, at each time $T_{i}$ , $i \in N$ , when the user needs to download some file $W_{k}$ , the user sends a set of real queries to each of the N databases. These queries are picked based on a certain probability distribution, defined on all possible sets of real queries. For the same file requirement, the user sends M dummy queries at future time instances $t_{i, j}$ , $j \in {1, \dots, M}$ , where $t_{i, j} > T_{i}$ . The dummy queries sent at each time $t_{i, j}$ are randomly selected from a subset of real queries. We assume that the databases are unaware of being deceived, and treat both real and dummy queries the same when calculating their predictions on the user-required file index at each time they receive a query. The overall probabilities of a given user sending each query for each file requirement are known by the databases. However, the decomposition of these probabilities based on whether each query is used as a real or a dummy query is not known by the databases. It is also assumed that the databases only store the queries received at the current time instance.

The main components of the general scheme include (1) $N^{K}$ possible sets of real queries to be sent to the N databases for each file requirement and their probabilities, (2) $N - 1$ possible sets of dummy queries and their probabilities, (3) overall probabilities of sending each query for each of the K file requirements of the user. Note that (1) and (2) are only known by the user, while (3) is known by the databases.

As shown in the examples considered, the set of all possible real queries takes the form of the queries in the probabilistic PIR scheme in [23,24], with a non-uniform probability distribution, unlike in PIR. The real query table used when retrieving $W_{k}$ consists of the following queries:

Single blocks: $W_{k}$ is divided into $N - 1$ parts, and each part is requested from $N - 1$ databases, while requesting nothing $ϕ$ from the remaining database. All cyclic shifts of these queries are considered in the real query table.
Sums of two blocks/Single block: One database is used to download $W_{j}^{l}$ , $l \in {1, \dots, N - 1}, j \neq k$ and each one in the rest of the $N - 1$ databases is used to download $W_{k}^{r} + W_{j}^{l}$ for each $r \in {1, \dots, N - 1}$ . All cyclic shifts of these queries are also considered as separate possible sets of queries.
Sums of three/Two blocks: One database is used to download $W_{j_{1}}^{ℓ_{1}} + W_{j_{2}}^{ℓ_{2}}$ , $ℓ_{1}, ℓ_{2} \in {1, \dots, N - 1}$ and $j_{1} \neq j_{2} \neq k$ . Each one in the rest of the $N - 1$ databases is used to download $W_{j_{1}}^{l_{1}} + W_{j_{2}}^{l_{2}} + W_{k}^{r}$ for each $r \in {1, \dots, N - 1}$ . All cyclic shifts of these queries are also considered as separate possible sets of queries.
Sums of $K$ and $K - 1$ blocks: The above process is repeated for all sums of blocks until K/ $K - 1$ .

Out of the $N^{K}$ different sets of queries described above in the real query table, all queries except $ϕ$ in single blocks, i.e., queries of the form $W_{k}^{ℓ}$ , $ℓ \in {1, \dots, N - 1}$ , are chosen as $ϵ$ -deceptive ones with respect to file k, for each $k \in {1, \dots, K}$ , and are included in the set of dummy queries sent to databases when the user-required file index is k. The $N - 1$ $ϵ$ -deceptive queries $W_{k}^{r}$ , $r \in {1, \dots, N - 1}$ , corresponding to the kth file requirement, must guarantee the condition in (11). For that, we assign

\begin{matrix} P (Q_{n} = W_{k}^{r} | θ = k, R = 1) = p, r \in {1, \dots, N - 1} \end{matrix}

(71)

and

\begin{matrix} P (Q_{n} = W_{k}^{r} | θ = j, R = 1) = p e^{ϵ}, r \in {1, \dots, N - 1}, j \neq k, \end{matrix}

(72)

for each database n, $n \in {1, \dots, N}$ . The rest of the queries, i.e., $ϕ$ and sums of ℓ blocks where $ℓ \in {2, \dots, K}$ , are PIR queries in the proposed scheme. Note that the query $ϕ$ is always coupled with the $ϵ$ -deceptive queries with respect to file index k (required file) for correctness (see Table 6, Table 8 and Table 10). Thus, $ϕ$ is assigned the corresponding probability given by

\begin{matrix} P (Q_{n} = ϕ | θ = m, R = 1) = p, m \in {1, \dots, K}, n \in {1, \dots, N} . \end{matrix}

(73)

Similarly, as the rest of the PIR queries are coupled with $ϵ$ -deceptive queries with respect to file indices j, $j \neq k$ , or with other PIR queries, they are assigned the corresponding probability given by

\begin{matrix} P (Q_{n} = \hat{Q} | θ = m, R = 1) = p e^{ϵ}, m \in {1, \dots, K}, n \in {1, \dots, N}, \end{matrix}

(74)

where $\hat{Q}$ is any PIR query in the form of ℓ-sums with $ℓ \in {2, \dots, K}$ . Since the probabilities of the real queries sent for each file requirement must add up to one, i.e., $\sum_{\tilde{Q} \in Q} P (Q_{n} = \tilde{Q} | θ = m, R = 1) = 1$ for each $m \in {1, \dots, K}$ , p is given by

\begin{matrix} p = \frac{1}{N + (N^{K} - N) e^{ϵ}}, \end{matrix}

(75)

as there are N query sets in the real query table with probability p, and $N^{K} - N$ sets with probability $p e^{ϵ}$ . Each $ϵ$ -deceptive query with respect to file index k is chosen with equal probability to be sent to the databases as dummy queries at times $t_{i, j}$ when the file requirement at the corresponding time $T_{i}$ is $W_{k}$ . Since there are $N - 1$ deceptive queries,

\begin{matrix} P (Q_{n} = W_{k}^{r} | θ = k, R = 0) = \frac{1}{N - 1}, r \in {1, \dots, N - 1} \end{matrix}

(76)

and

\begin{matrix} P (Q_{n} = W_{k}^{r} | θ = j, R = 0) = 0, r \in {1, \dots, N - 1}, j \neq k \end{matrix}

(77)

for each database n, $n \in {1, \dots, N}$ . Therefore, for all $ϵ$ -deceptive queries with respect to file index k of the form $W_{k}^{i}$ , the condition in (12) can be written as

\begin{matrix} \frac{α}{α + \frac{1}{p (N - 1)} (1 - α)} & = e^{- 2 ϵ} \end{matrix}

(78)

thus,

\begin{matrix} α & = \frac{1}{p (N - 1) (e^{2 ϵ} - 1) + 1} = \frac{N + (N^{K} - N) e^{ϵ}}{(N - 1) e^{2 ϵ} + (N^{K} - N) e^{ϵ} + 1}, \end{matrix}

(79)

which characterizes $α = E [\frac{1}{M + 1}]$ . The information available to database n, $n \in {1, \dots, N}$ , is the overall probability of receiving each query for each file requirement of the user $P (Q_{n} = \tilde{Q} | θ = k)$ , $k \in {1, \dots, K}$ , given by

\begin{matrix} P (Q_{n} = \tilde{Q} | θ = k) & = P (Q_{n} = \tilde{Q} | θ = k, R = 1) P (R = 1 | θ = k) \\ + P (Q_{n} = \tilde{Q} | θ = k, R = 0) P (R = 0 | θ = k) . \end{matrix}

(80)

For $ϵ$ -deceptive queries with respect to file index k, i.e., $W_{k}^{j}$ , $j \in {1, \dots, N - 1}$ , the overall probability in (80) from the perspective of database n, $n \in {1, \dots, N}$ , is given by

\begin{matrix} P (Q_{n} = W_{k}^{j} | θ = ℓ) & = \{\begin{matrix} α p + \frac{1 - α}{N - 1} = \frac{e^{2 ϵ}}{(N - 1) (e^{2 ϵ} - 1) + N + (N^{K} - N) e^{ϵ}}, & ℓ = k \\ α p e^{ϵ} = \frac{e^{ϵ}}{(N - 1) (e^{2 ϵ} - 1) + N + (N^{K} - N) e^{ϵ}}, & ℓ \neq k . \end{matrix} \end{matrix}

(81)

The probability of sending the null query $ϕ$ to database n, $n \in {1, \dots, N}$ , for each file requirement k, $k \in {1, \dots, K}$ , is

\begin{matrix} P (Q_{n} = ϕ | θ = k) = α p = \frac{1}{(N - 1) (e^{2 ϵ} - 1) + N + (N^{K} - N) e^{ϵ}} . \end{matrix}

(82)

For the rest of the PIR queries denoted by $\hat{Q}$ , i.e., queries of the form $\sum_{s = 1}^{ℓ} W_{i_{s}}^{j_{s}}$ for $ℓ \in {2, \dots, K}$ , the overall probability in (80), known by each database n, $n \in {1, \dots, N}$ for each file requirement k, $k \in {1, \dots, K}$ , is given by

\begin{matrix} P (Q_{n} = \hat{Q} | θ = k) = α p e^{ϵ} = \frac{e^{ϵ}}{(N - 1) (e^{2 ϵ} - 1) + N + (N^{K} - N) e^{ϵ}} . \end{matrix}

(83)

Based on the query received at a given time t, each database n, $n \in {1, \dots, N}$ , calculates the a posteriori probability of the user-required file index being k, $k \in {1, \dots, K}$ , using

\begin{matrix} P (θ = k | Q_{n} = \tilde{Q}) & = \frac{P (Q_{n} = \tilde{Q} | θ = k) P (θ = k)}{P (Q_{n} = \tilde{Q})} . \end{matrix}

(84)

Since we assume uniform priors, i.e., $P (θ = k) = \frac{1}{K}$ for all $k \in {1, \dots, K}$ , the posteriors are directly proportional to $P (Q_{n} = \tilde{Q} | θ = k)$ for each $\tilde{Q}$ . Therefore, the databases predict the user-required file index for each query received using (5) and (81)–(83). For example, when the query $W_{1}^{1}$ is received, it is clear that the maximum $P (θ = k | Q_{n} = W_{1}^{1})$ in (5) is obtained for $k = 1$ from (81) and (84). The prediction corresponding to any query received is given in Table 14 along with the corresponding probability of choosing the given prediction (The superscript j in the first column of Table 14 corresponds to any index in the set ${1, \dots . N - 1}$ ).

Table 14.

Probabilities of each database predicting the user-required file.

Query $\tilde{Q}$	$P ({\hat{θ}}_{\tilde{Q}} = 1)$	$P ({\hat{θ}}_{\tilde{Q}} = 2)$	$P ({\hat{θ}}_{\tilde{Q}} = 3)$	…	$P ({\hat{θ}}_{\tilde{Q}} = K)$
$W_{1}^{j}$	1	0	0	…	0
$W_{2}^{j}$	0	1	0	…	0
$W_{3}^{j}$	0	0	1	…	0
⋮	⋮	⋮	⋮	⋮	⋮
$W_{K}^{j}$	0	0	0	…	1
other queries	$\frac{1}{K}$	$\frac{1}{K}$	$\frac{1}{K}$	…	$\frac{1}{K}$

Open in a new tab

Based on the information in Table 14, the probability of error when a database n, $n \in {1, \dots, N}$ , receives the query $W_{k}^{ℓ}$ at some time $T_{i}$ is given by

\begin{matrix} P ({\hat{θ}}_{W_{k}^{ℓ}}^{[T_{i}]} \neq θ^{[T_{i}]}) & = P (θ^{[T_{i}]} \neq k | Q_{n}^{[T_{i}]} = W_{k}^{ℓ}) \end{matrix}

(85)

\begin{matrix} = \sum_{j = 1, j \neq k}^{K} P (θ^{[T_{i}]} = j | Q_{n}^{[T_{i}]} = W_{k}^{ℓ}) \end{matrix}

(86)

\begin{matrix} = \frac{\sum_{j = 1, j \neq k}^{K} P (Q_{n}^{[T_{i}]} = W_{k}^{ℓ} | θ^{[T_{i}]} = j) P (θ^{[T_{i}]} = j)}{P (Q_{n}^{[T_{i}]} = W_{k}^{ℓ})} \end{matrix}

(87)

\begin{matrix} = \frac{\frac{1}{K} p e^{ϵ} (K - 1)}{P (Q_{n}^{[T_{i}]} = W_{k}^{ℓ})}, \end{matrix}

(88)

where (88) follows from the fact that the user sends real queries based on the probabilities $P (Q_{n} = \tilde{Q} | θ = k, R = 1)$ at time $T_{i}$ .

For all other queries $\tilde{Q}$ , the corresponding probability of error is given by

\begin{matrix} P ({\hat{θ}}_{\tilde{Q}}^{[T_{i}]} \neq θ^{[T_{i}]}) & = P ({\hat{θ}}^{[T_{i}]} \neq θ^{[T_{i}]} | Q_{n}^{[T_{i}]} = \tilde{Q}) \end{matrix}

(89)

\begin{matrix} = \frac{\sum_{j = 1}^{K} \sum_{m = 1, m \neq j}^{K} P ({\hat{θ}}^{[T_{i}]} = m, θ^{[T_{i}]} = j, Q_{n}^{[T_{i}]} = \tilde{Q})}{P (Q_{n}^{[T_{i}]} = \tilde{Q})} \end{matrix}

(90)

\begin{matrix} = \frac{\sum_{j = 1}^{K} \sum_{m = 1, m \neq j}^{K} P ({\hat{θ}}^{[T_{i}]} = m | θ^{[T_{i}]} = j, Q_{n}^{[T_{i}]} = \tilde{Q}) P (Q_{n}^{[T_{i}]} = \tilde{Q} | θ^{[T_{i}]} = j) P (θ^{[T_{i}]} = j)}{P (Q_{n}^{[T_{i}]} = \tilde{Q})} \end{matrix}

(91)

\begin{matrix} = \frac{1}{P (Q_{n}^{[T_{i}]} = \tilde{Q})} \{\begin{matrix} \frac{(K - 1) p}{K}, & if \tilde{Q} = ϕ \\ \frac{(K - 1) p e^{ϵ}}{K}, & if \tilde{Q} of the form \sum_{s = 1}^{ℓ} W_{i_{s}}^{j_{s}}, ℓ \in {2, \dots, K} \end{matrix} \end{matrix}

(92)

where (92) follows from the fact that ${\hat{θ}}^{[T_{i}]}$ is conditionally independent of $θ^{[T_{i}]}$ given Q from (5). The probability of error of each database’s prediction is given by

\begin{matrix} P_{e} & = \sum_{\tilde{Q}} P (Q_{n}^{[T_{i}]} = \tilde{Q}) P ({\hat{θ}}^{[T_{i}]} \neq θ^{[T_{i}]} | Q^{[T_{i}]} = \tilde{Q}) \end{matrix}

(93)

\begin{matrix} = \sum_{k = 1}^{K} \sum_{ℓ = 1}^{N - 1} P (Q_{n}^{[T_{i}]} = W_{k}^{ℓ}) \frac{\frac{1}{K} p e^{ϵ} (K - 1)}{P (Q_{n}^{[T_{i}]} = W_{k}^{ℓ})} + P (Q_{n}^{[T_{i}]} = ϕ) \frac{\frac{1}{K} (K - 1) p}{P (Q_{n}^{[T_{i}]} = ϕ)} \end{matrix}

\begin{matrix} + (N^{K} - 1 - K (N - 1)) P (P (Q_{n}^{[T_{i}]} = \hat{Q}) \frac{\frac{1}{K} (K - 1) p e^{ϵ}}{P (Q_{n}^{[T_{i}]} = \hat{Q})}) \end{matrix}

(94)

\begin{matrix} = p e^{ϵ} (K - 1) (N - 1) + \frac{(K - 1) p}{K} + \frac{(K - 1) p e^{ϵ} (N^{K} - 1 - K (N - 1))}{K} \end{matrix}

(95)

\begin{matrix} = \frac{(K - 1) (1 + e^{ϵ} (N^{K} - 1))}{K (N + (N^{K} - N) e^{ϵ})}, \end{matrix}

(96)

where $\hat{Q}$ in (94) represents the queries of the form $\sum_{s = 1}^{ℓ} W_{i_{s}}^{j_{s}}$ for $ℓ \in {2, \dots, K}$ . Note that $P (Q_{n}^{[T_{i}]} = \hat{Q})$ is the same for each $\hat{Q}$ as $P (Q_{n}^{[T_{i}]} = \hat{Q} | θ = j) = p e^{ϵ}$ for each $\hat{Q}$ and all $j \in {1, \dots, K}$ from (74). Thus, the amount of deception achieved by this scheme for a given $ϵ$ is given by

\begin{matrix} D = P_{e} - (1 - \frac{1}{K}) = \frac{(K - 1) (N - 1) (e^{ϵ} - 1)}{K (N + (N^{K} - N) e^{ϵ})} . \end{matrix}

(97)

Therefore, for a required amount of deception d, satisfying $d < \frac{(K - 1) (N - 1)}{K (N^{K} - N)}$ , the value of $ϵ$ must be chosen as

\begin{matrix} ϵ = ln (\frac{d K N + (K - 1) (N - 1)}{d K N + (K - 1) (N - 1) - d K N^{K}}) . \end{matrix}

(98)

The download cost of the general scheme is

\begin{matrix} D_{L} & = \frac{1}{L} (N p L + (N^{K} - N) p e^{ϵ} \frac{N L}{N - 1} + \frac{N L}{N - 1} E [M]) \end{matrix}

(99)

\begin{matrix} D_{L} & = N p + \frac{N (N^{K} - N)}{N - 1} p e^{ϵ} + (\frac{N}{N - 1}) E [M] \end{matrix}

(100)

\begin{matrix} D_{L} & = \frac{N}{N - 1} (1 - \frac{1}{N + (N^{K} - N) e^{ϵ}} + E [M]) . \end{matrix}

(101)

The following optimization problem needs to be solved to minimize the download cost while satisfying $α = \frac{N + (N^{K} - N) e^{ϵ}}{(N - 1) e^{2 ϵ} + (N^{K} - N) e^{ϵ} + 1}$ , from (49):

\begin{matrix} min & E [M] \\ s . t . & E [\frac{1}{M + 1}] = \frac{N + (N^{K} - N) e^{ϵ}}{(N - 1) e^{2 ϵ} + (N^{K} - N) e^{ϵ} + 1} = α . \end{matrix}

(102)

Lemma 1.

The solution to the optimization problem in (102) is given by

$\begin{matrix} E [M] = 2 u - u (u + 1) α, \end{matrix}$ (103)

where $u = ⌊ \frac{1}{α} ⌋$ for a given value of α, which is specified by the required level of deception d.

The proof of Lemma 1 is given in Appendix A. The minimum download cost for the general case with N databases, K files, and a deception requirement d is obtained by (101) and (103). The corresponding maximum achievable rate is given in (9).

5. Discussion and Conclusions

We introduced the problem of deceptive information retrieval (DIR), in which a user retrieves a file from a set of independent files stored in multiple databases, while revealing fake information about the required file to the databases, which makes the probability of error of the databases’ prediction on the user-required file index high. The proposed scheme achieves rates lower than the PIR capacity when the required level of deception is positive, as it sends dummy queries at distinct time instances to deceive the databases. When the required level of deception is zero, the achievable DIR rate is the same as the PIR capacity.

The probability of error of the databases’ prediction on the user-required file index is calculated at the time of the user’s requirement, as defined in Section 2. In the proposed scheme, the user sends dummy queries at other (future) time instances as well. As the databases are unaware of being deceived, and are unable to distinguish between the times corresponding to real and dummy queries, they make predictions on the user-required file indices every time a query is received. Note that whenever a query of the form $W_{k}^{ℓ}$ is received, the databases’ prediction is going to be $\hat{θ} = k$ from Table 14. Although this is an incorrect prediction with high probability at times corresponding to the user’s real requirements, these predictions are correct when $W_{k}^{ℓ}$ is used as a dummy query, as $W_{k}^{ℓ}$ is only sent as a dummy query when the user needs to download file k. However, the databases are only able to obtain these correct predictions at future time instances, after which the user has already downloaded the required file while also deceiving the databases.

The reason for the requirement of the time dimension is also explained as follows. An alternative approach to using the time dimension is to select a subset of databases to send the dummy queries and to send the real queries to the rest of the databases. As explained above, whenever a database receives a query of the form $W_{k}^{ℓ}$ as a dummy query, the database predicts the user-required file correctly. Therefore, this approach leaks information about the required file to a subset of databases, right at the time of the retrieval, while deceiving the rest. Hence, to deceive all databases at the time of retrieval, we exploit the time dimension that is naturally present in information retrieval applications that are time-sensitive.

A potential future direction of this work is an analysis on the time dimension. Note that, in this work, we assume that the databases do not keep track of the previous queries and only store the information corresponding to the current time instance. Therefore, as long as the dummy queries are sent at distinct time instances that are also different from the time of the user’s requirement, the calculations presented in this paper are valid. An extension of basic DIR can be formulated by assuming that the databases keep track of all queries received and their time stamps. This imposes additional constraints on the problem, as the databases now have extra information along the time dimension, which requires the scheme to choose the time instances at which the dummy queries are sent, in such a way that they do not leak any information about the existence of the two types (real and dummy) of queries. Another direction is to incorporate the freshness and age of information into DIR, where the user may trade the age of the required file for a reduced download cost, by making use of the previous dummy downloads present in DIR.

Appendix A. Proof of Lemma 1

The solution to the optimization problem in (102) for the general case with N databases and K files is as follows. The optimization problem in (102), for a required amount of deception d and the corresponding $ϵ$ with $α = \frac{N + (N^{K} - N) e^{ϵ}}{(N - 1) e^{2 ϵ} + (N^{K} - N) e^{ϵ} + 1}$ , is given by

\begin{matrix} min & E [M] = \sum_{m = 0}^{\infty} m p_{m} \\ s . t . & E [\frac{1}{m + 1}] = \sum_{m = 0}^{\infty} (\frac{1}{m + 1}) p_{m} = α \\ \sum_{m = 0}^{\infty} p_{m} = 1 \end{matrix}

\begin{matrix} p_{m} \geq 0, m \in {0, 1, \dots} . \end{matrix}

(A1)

We need to determine the optimum PMF of M that minimizes $E [M]$ while satisfying the given condition. The Lagrangian L of this optimization problem is given by

\begin{matrix} L = \sum_{m = 0}^{\infty} m p_{m} + λ_{1} (\sum_{m = 0}^{\infty} (\frac{1}{m + 1}) p_{m} - α) + λ_{2} (\sum_{m = 0}^{\infty} p_{m} - 1) - \sum_{m = 0}^{\infty} μ_{m} p_{m} . \end{matrix}

(A2)

Then, the following set of equations need to be solved to find the minimum $E [M]$ :

\begin{matrix} \frac{\partial L}{\partial p_{m}} = m + λ_{1} (\frac{1}{m + 1}) + λ_{2} - μ_{m} & = 0, m \in {0, 1, \dots} \end{matrix}

(A3)

\begin{matrix} \sum_{m = 0}^{\infty} (\frac{1}{m + 1}) p_{m} & = α \end{matrix}

(A4)

\begin{matrix} \sum_{m = 0}^{\infty} p_{m} & = 1 \end{matrix}

(A5)

\begin{matrix} μ_{m} p_{m} & = 0, m \in {0, 1, \dots} \end{matrix}

(A6)

\begin{matrix} μ_{m}, p_{m} & \geq 0, m \in {0, 1, \dots} . \end{matrix}

(A7)

Case 1: Assume that the PMF of M contains at most two non-zero probabilities, i.e., $p_{0}, p_{1} \geq 0$ and $p_{i} = 0$ , $i \in {2, 3, \dots}$ . Then, the conditions in (A3)–(A7) are simplified as

\begin{matrix} \frac{\partial L}{\partial p_{0}} = λ_{1} + λ_{2} - μ_{0} & = 0 \end{matrix}

(A8)

\begin{matrix} \frac{\partial L}{\partial p_{1}} = \frac{1}{2} λ_{1} + λ_{2} - μ_{1} & = - 1 \end{matrix}

(A9)

\begin{matrix} p_{0} + \frac{1}{2} p_{1} & = α \end{matrix}

(A10)

\begin{matrix} p_{0} + p_{1} & = 1 \end{matrix}

(A11)

\begin{matrix} μ_{0} p_{0} & = 0 \end{matrix}

(A12)

\begin{matrix} μ_{1} p_{1} & = 0 \end{matrix}

(A13)

\begin{matrix} μ_{0}, μ_{1}, p_{0}, p_{1} & \geq 0 . \end{matrix}

(A14)

From (A10) and (A11), we obtain

\begin{matrix} p_{0} + \frac{1}{2} (1 - p_{0}) & = α \end{matrix}

(A15)

and thus,

\begin{matrix} p_{0} = 2 α - 1, p_{1} = 2 - 2 α, \end{matrix}

(A16)

which along with (A14) implies that this solution is only valid for $\frac{1}{2} \leq α \leq 1$ . The corresponding optimum value of $E [M]$ is given by

\begin{matrix} E [M] = 1 - p_{0} = 2 - 2 α, \frac{1}{2} \leq α \leq 1 . \end{matrix}

(A17)

Case 2: Now consider the case where at most three probabilities of the PMF of M are allowed to be non-zero, i.e., $p_{0}, p_{1}, p_{2} \geq 0$ and $p_{i} = 0$ , $i \in {3, 4, \dots}$ . The set of conditions in (A3)–(A7) for this case is

\begin{matrix} \frac{\partial L}{\partial p_{m}} = m + λ_{1} (\frac{1}{m + 1}) + λ_{2} - μ_{m} & = 0, m \in {0, 1, 2} \end{matrix}

(A18)

\begin{matrix} \sum_{m = 0}^{2} (\frac{1}{m + 1}) p_{m} & = α \end{matrix}

(A19)

\begin{matrix} \sum_{m = 0}^{2} p_{m} & = 1 \end{matrix}

(A20)

\begin{matrix} μ_{m} p_{m} & = 0, m \in {0, 1, 2} \end{matrix}

(A21)

\begin{matrix} μ_{m}, p_{m} & \geq 0, m \in {0, 1, 2} . \end{matrix}

(A22)

The set of conditions in (A18)–(A22) can be written in a matrix form as

\begin{matrix} [\begin{matrix} 1 & 1 & - 1 & 0 & 0 & 0 & 0 & 0 \\ \frac{1}{2} & 1 & 0 & - 1 & 0 & 0 & 0 & 0 \\ \frac{1}{3} & 1 & 0 & 0 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & \frac{1}{2} & \frac{1}{3} \\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 \end{matrix}] [\begin{matrix} λ_{1} \\ λ_{2} \\ μ_{0} \\ μ_{1} \\ μ_{2} \\ p_{0} \\ p_{1} \\ p_{2} \end{matrix}] = [\begin{matrix} 0 \\ - 1 \\ - 2 \\ α \\ 1 \end{matrix}] . \end{matrix}

(A23)

Three of the above eight variables, i.e., either $μ_{i}$ or $p_{i}$ for each i, are always zero according to (A21). We consider all choices of ${μ_{i}, p_{i}}$ pairs such that one element of the pair is equal to zero, and the other one is a positive variable, and solve the system for the non-zero variables. Then we calculate the resulting $E [M]$ , along with the corresponding regions of u for which the solutions are applicable. For each region of u, we find the solution to (A23) that results in the minimum $E [M]$ . Based on this process, the optimum values of $p_{i}$ , $i \in {0, 1, 2}$ , the corresponding ranges of u, and the minimum values of $E [M]$ are given in Table A1.

Table A1.

Solution to Case 2: Optimum PMF of M, valid ranges of $α$ , and minimum $E [M]$ .

Range of $α$	$p_{0}$	$p_{1}$	$p_{2}$	$E [M]$
$\frac{1}{3} \leq α \leq \frac{1}{2}$	0	$6 α - 2$	$3 - 6 α$	$4 - 6 α$
$\frac{1}{2} \leq α \leq 1$	$2 α - 1$	$2 - 2 α$	0	$2 - 2 α$

Open in a new tab

As an example, consider the calculations corresponding to the case where $μ_{0} > 0$ , $μ_{1} = μ_{2} = 0$ , which implies $p_{0} = 0$ , $p_{1}, p_{2} > 0$ . Note that for this case, (A23) simplifies to

\begin{matrix} [\begin{matrix} 1 & 1 & - 1 & 0 & 0 \\ \frac{1}{2} & 1 & 0 & 0 & 0 \\ \frac{1}{3} & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & \frac{1}{2} & \frac{1}{3} \\ 0 & 0 & 0 & 1 & 1 \end{matrix}] [\begin{matrix} λ_{1} \\ λ_{2} \\ μ_{0} \\ p_{1} \\ p_{2} \end{matrix}] = [\begin{matrix} 0 \\ - 1 \\ - 2 \\ α \\ 1 \end{matrix}] . \end{matrix}

(A24)

The values of $p_{1}$ and $p_{2}$ , from the solution of the above system, and the corresponding range of $α$ , from (A22), along with the resulting $E [M]$ , are given by

\begin{matrix} p_{1} = 6 α - 2, p_{2} = 3 - 6 α, \frac{1}{3} \leq α \leq \frac{1}{2}, E [M] = 4 - 6 α . \end{matrix}

(A25)

Case 3: At most four non-zero elements of the PMF of M are considered in this case, i.e., $p_{0}, p_{1}, p_{2}, p_{3} \geq 0$ and $p_{i} = 0$ , $i \in {4, 5, \dots}$ . The conditions in (A3)–(A7) can be written in a matrix form as

\begin{matrix} [\begin{matrix} 1 & 1 & - 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \frac{1}{2} & 1 & 0 & - 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ \frac{1}{3} & 1 & 0 & 0 & - 1 & 0 & 0 & 0 & 0 & 0 \\ \frac{1}{4} & 1 & 0 & 0 & 0 & - 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{matrix}] [\begin{matrix} λ_{1} \\ λ_{2} \\ μ_{0} \\ μ_{1} \\ μ_{2} \\ μ_{3} \\ p_{0} \\ p_{1} \\ p_{2} \\ p_{3} \end{matrix}] = [\begin{matrix} 0 \\ - 1 \\ - 2 \\ - 3 \\ α \\ 1 \end{matrix}] . \end{matrix}

(A26)

Using the same method described in Case 2, the optimum values of $p_{i}$ , $i \in {0, 1, 2, 3}$ , corresponding ranges of $α$ , and the resulting minimum $E [M]$ for Case 3 are given in Table A2.

Table A2.

Solution to Case 3: Optimum PMF of M, valid ranges of $α$ and minimum $E [M]$ .

Range of $α$	$p_{0}$	$p_{1}$	$p_{2}$	$p_{3}$	$E [M]$
$\frac{1}{4} \leq α \leq \frac{1}{3}$	0	0	$12 α - 3$	$4 - 12 α$	$6 - 12 α$
$\frac{1}{3} \leq α \leq \frac{1}{2}$	0	$6 α - 2$	$3 - 6 α$	0	$4 - 6 α$
$\frac{1}{2} \leq α \leq 1$	$2 α - 1$	$2 - 2 α$	0	0	$2 - 2 α$

Open in a new tab

Case 4: At most five non-zero elements of the PMF of M are considered in this case, i.e., $p_{0}, p_{1}, p_{2}, p_{3}, p_{4} \geq 0$ and $p_{i} = 0$ , $i \in {5, 6, \dots}$ . The conditions in (A3)–(A7) can be written in a matrix form as

\begin{matrix} [\begin{matrix} 1 & 1 & - 1 & 0 & 0 & 0 & 0 & 0 & \dots & 0 \\ \frac{1}{2} & 1 & 0 & - 1 & 0 & 0 & 0 & 0 & \dots & 0 \\ \frac{1}{3} & 1 & 0 & 0 & - 1 & 0 & 0 & 0 & \dots & 0 \\ \frac{1}{4} & 1 & 0 & 0 & 0 & - 1 & 0 & 0 & \dots & 0 \\ \frac{1}{5} & 1 & 0 & 0 & 0 & 0 & - 1 & 0 & \dots & 0 \\ 0 & \dots & 0 & 0 & 0 & 1 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} & \frac{1}{5} \\ 0 & \dots & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 \end{matrix}] [\begin{matrix} λ_{1} \\ λ_{2} \\ μ_{0} \\ μ_{1} \\ μ_{2} \\ μ_{3} \\ μ_{4} \\ p_{0} \\ p_{1} \\ p_{2} \\ p_{3} \\ p_{4} \end{matrix}] = [\begin{matrix} 0 \\ - 1 \\ - 2 \\ - 3 \\ - 4 \\ α \\ 1 \end{matrix}] . \end{matrix}

(A27)

Using the same method as before, the optimum values of $p_{i}$ , $i \in {0, 1, 2, 3, 4}$ , the corresponding ranges of $α$ , and the resulting minimum $E [M]$ for Case 4 are given in Table A3.

Table A3.

Solution to Case 4: Optimum PMF of M, valid ranges of $α$ , and minimum $E [M]$ .

Range of $α$	$p_{0}$	$p_{1}$	$p_{2}$	$p_{3}$	$p_{4}$	$E [M]$
$\frac{1}{5} \leq α \leq \frac{1}{4}$	0	0	0	$20 α - 4$	$5 - 20 α$	$8 - 20 α$
$\frac{1}{4} \leq α \leq \frac{1}{3}$	0	0	$12 α - 3$	$4 - 12 α$	0	$6 - 12 α$
$\frac{1}{3} \leq α \leq \frac{1}{2}$	0	$6 α - 2$	$3 - 6 α$	0	0	$4 - 6 α$
$\frac{1}{2} \leq α \leq 1$	$2 α - 1$	$2 - 2 α$	0	0	0	$2 - 2 α$

Open in a new tab

Note that the PMF of M and the resulting $E [M]$ are the same for a given $α$ in all cases (see Table A1, Table A2 and Table A3) irrespective of the support of the PMF of M considered. Therefore, we observe from the above cases that, for a given $α$ in the range $\frac{1}{ℓ + 1} \leq α \leq \frac{1}{ℓ}$ , $E [M]$ is minimized when the PMF of M is such that

\begin{matrix} p_{ℓ}, p_{ℓ - 1} > 0, and p_{i} = 0 for i \in Z^{+} ∖ {ℓ, ℓ - 1}, \end{matrix}

(A28)

which requires $p_{ℓ}$ and $p_{ℓ - 1}$ to satisfy

\begin{matrix} p_{ℓ} + p_{ℓ - 1} & = 1 \end{matrix}

(A29)

\begin{matrix} E [\frac{1}{M + 1}] = p_{ℓ} \frac{1}{ℓ + 1} + p_{ℓ - 1} \frac{1}{ℓ} & = α . \end{matrix}

(A30)

Therefore, for a given $α$ in the range $\frac{1}{ℓ + 1} \leq α \leq \frac{1}{ℓ}$ , the optimum PMF of M and the resulting minimum $E [M]$ are given by

\begin{matrix} p_{ℓ} = (ℓ + 1) (1 - ℓ α), p_{ℓ - 1} = ℓ ((ℓ + 1) α - 1), E [M] = 2 ℓ - α ℓ (ℓ + 1) . \end{matrix}

(A31)

Author Contributions

Formal analysis, S.V.; Investigation, S.V.; Writing—original draft, S.V.; Writing—review & editing, S.U.; Supervision, S.U.; Project administration, S.U. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding Statement

This research received no external funding.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

1.Chor B., Kushilevitz E., Goldreich O., Sudan M. Private Information Retrieval. J. ACM. 1998;45:965–981. doi: 10.1145/293347.293350. [DOI] [Google Scholar]
2.Sun H., Jafar S.A. The Capacity of Private Information Retrieval. IEEE Trans. Inf. Theory. 2017;63:4075–4088. doi: 10.1109/TIT.2017.2689028. [DOI] [Google Scholar]
3.Tian C., Sun H., Chen J. Capacity-Achieving Private Information Retrieval Codes with Optimal Message Size and Upload Cost. IEEE Trans. Inf. Theory. 2019;65:7613–7627. doi: 10.1109/TIT.2019.2918207. [DOI] [Google Scholar]
4.Banawan K., Ulukus S. The Capacity of Private Information Retrieval from Coded Databases. IEEE Trans. Inf. Theory. 2018;64:1945–1956. doi: 10.1109/TIT.2018.2791994. [DOI] [Google Scholar]
5.Sun H., Jafar S.A. The Capacity of Robust Private Information Retrieval with Colluding Databases. IEEE Trans. Inf. Theory. 2018;64:2361–2370. doi: 10.1109/TIT.2017.2777490. [DOI] [Google Scholar]
6.Kadhe S., Garcia B., Heidarzadeh A., El Rouayheb S., Sprintson A. Private Information Retrieval With Side Information. IEEE Trans. Inf. Theory. 2020;66:2032–2043. doi: 10.1109/TIT.2019.2948845. [DOI] [Google Scholar]
7.Li S., Gastpar M. Single-server Multi-message Private Information Retrieval with Side Information: The General Cases; Proceedings of the IEEE ISIT; Los Angeles, CA, USA. 21–26 June 2020. [Google Scholar]
8.Yang H., Shin W., Lee J. Private information retrieval for secure distributed storage systems. IEEE Trans. Inf. Forensics Secur. 2018;13:2953–2964. doi: 10.1109/TIFS.2018.2833050. [DOI] [Google Scholar]
9.Jia Z., Jafar S.A. X-Secure T-Private Information Retrieval from MDS Coded Storage with Byzantine and Unresponsive Servers. IEEE Trans. Inf. Theory. 2020;66:7427–7438. doi: 10.1109/TIT.2020.3013152. [DOI] [Google Scholar]
10.Banawan K., Ulukus S. Multi-Message Private Information Retrieval: Capacity Results and Near-Optimal Schemes. IEEE Trans. Inf. Theory. 2018;64:6842–6862. doi: 10.1109/TIT.2018.2828310. [DOI] [Google Scholar]
11.Wang Q., Sun H., Skoglund M. The Capacity of Private Information Retrieval with Eavesdroppers. IEEE Trans. Inf. Theory. 2019;65:3198–3214. doi: 10.1109/TIT.2018.2884891. [DOI] [Google Scholar]
12.Kumar S., Lin H.-Y., Rosnes E., Amat A.G.i. Achieving Maximum Distance Separable Private Information Retrieval Capacity with Linear Codes. IEEE Trans. Inf. Theory. 2019;65:4243–4273. doi: 10.1109/TIT.2019.2900313. [DOI] [Google Scholar]
13.Sun H., Jafar S.A. The Capacity of Symmetric Private Information Retrieval. IEEE Trans. Inf. Theory. 2019;65:322–329. doi: 10.1109/TIT.2018.2848977. [DOI] [Google Scholar]
14.Woolsey N., Chen R., Ji M. Uncoded Placement with Linear Sub-Messages for Private Information Retrieval from Storage Constrained Databases. IEEE Trans. Commun. 2020;68:6039–6053. doi: 10.1109/TCOMM.2020.3010988. [DOI] [Google Scholar]
15.Fanti G., Ramchandran K. Efficient Private Information Retrieval over Unsynchronized Databases. IEEE J. Sel. Top. Signal Process. 2015;9:1229–1239. doi: 10.1109/JSTSP.2015.2432740. [DOI] [Google Scholar]
16.Samy I., Attia M., Tandon R., Lazos L. Asymmetric Leaky Private Information Retrieval. IEEE Trans. Inf. Theory. 2021;67:5352–5369. doi: 10.1109/TIT.2021.3085363. [DOI] [Google Scholar]
17.Guo T., Zhou R., Tian C. On the Information Leakage in Private Information Retrieval Systems. IEEE Trans. Inf. Forensics Secur. 2020;15:2999–3012. doi: 10.1109/TIFS.2020.2981282. [DOI] [Google Scholar]
18.Liebowitz D., Nepal S., Moore K., Christopher C., Kanhere S., Nguyen D., Timmer R., Longland M., Rathakumar K. Deception for Cyber Defence: Challenges and Opportunities; Proceedings of the TPS-ISA; Atlanta, GA, USA. 13–15 December 2021. [Google Scholar]
19.Yarali A., Sahawneh F. Deception: Technologies and Strategy for Cybersecurity; Proceedings of the SmartCloud; Tokyo, Japan. 10–12 December 2019. [Google Scholar]
20.Faveri C., Moreira A. Designing Adaptive Deception Strategies; Proceedings of the QRS-C; Vienna, Austria. 1–3 August 2016. [Google Scholar]
21.Tounsi W. Cyber Deception, the Ultimate Piece of a Defensive Strategy—Proof of Concept; Proceedings of the CSNet; Rio de Janeiro, Brazil. 24–26 October 2022. [Google Scholar]
22.Sarr A., Anwar A., Kamhoua C., Leslie N., Acosta J. Software Diversity for Cyber Deception; Proceedings of the IEEE Globecom; Taipei, Taiwan. 7–11 December 2020. [Google Scholar]
23.Samy I., Tandon R., Lazos L. On the Capacity of Leaky Private Information Retrieval; Proceedings of the IEEE ISIT; Paris, France. 7–12 July 2019. [Google Scholar]
24.Vithana S., Banawan K., Ulukus S. Semantic Private Information Retrieval. IEEE Trans. Inf. Theory. 2022;68:2635–2652. doi: 10.1109/TIT.2021.3136583. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

[B1-entropy-26-00244] 1.Chor B., Kushilevitz E., Goldreich O., Sudan M. Private Information Retrieval. J. ACM. 1998;45:965–981. doi: 10.1145/293347.293350. [DOI] [Google Scholar]

[B2-entropy-26-00244] 2.Sun H., Jafar S.A. The Capacity of Private Information Retrieval. IEEE Trans. Inf. Theory. 2017;63:4075–4088. doi: 10.1109/TIT.2017.2689028. [DOI] [Google Scholar]

[B3-entropy-26-00244] 3.Tian C., Sun H., Chen J. Capacity-Achieving Private Information Retrieval Codes with Optimal Message Size and Upload Cost. IEEE Trans. Inf. Theory. 2019;65:7613–7627. doi: 10.1109/TIT.2019.2918207. [DOI] [Google Scholar]

[B4-entropy-26-00244] 4.Banawan K., Ulukus S. The Capacity of Private Information Retrieval from Coded Databases. IEEE Trans. Inf. Theory. 2018;64:1945–1956. doi: 10.1109/TIT.2018.2791994. [DOI] [Google Scholar]

[B5-entropy-26-00244] 5.Sun H., Jafar S.A. The Capacity of Robust Private Information Retrieval with Colluding Databases. IEEE Trans. Inf. Theory. 2018;64:2361–2370. doi: 10.1109/TIT.2017.2777490. [DOI] [Google Scholar]

[B6-entropy-26-00244] 6.Kadhe S., Garcia B., Heidarzadeh A., El Rouayheb S., Sprintson A. Private Information Retrieval With Side Information. IEEE Trans. Inf. Theory. 2020;66:2032–2043. doi: 10.1109/TIT.2019.2948845. [DOI] [Google Scholar]

[B7-entropy-26-00244] 7.Li S., Gastpar M. Single-server Multi-message Private Information Retrieval with Side Information: The General Cases; Proceedings of the IEEE ISIT; Los Angeles, CA, USA. 21–26 June 2020. [Google Scholar]

[B8-entropy-26-00244] 8.Yang H., Shin W., Lee J. Private information retrieval for secure distributed storage systems. IEEE Trans. Inf. Forensics Secur. 2018;13:2953–2964. doi: 10.1109/TIFS.2018.2833050. [DOI] [Google Scholar]

[B9-entropy-26-00244] 9.Jia Z., Jafar S.A. X-Secure T-Private Information Retrieval from MDS Coded Storage with Byzantine and Unresponsive Servers. IEEE Trans. Inf. Theory. 2020;66:7427–7438. doi: 10.1109/TIT.2020.3013152. [DOI] [Google Scholar]

[B10-entropy-26-00244] 10.Banawan K., Ulukus S. Multi-Message Private Information Retrieval: Capacity Results and Near-Optimal Schemes. IEEE Trans. Inf. Theory. 2018;64:6842–6862. doi: 10.1109/TIT.2018.2828310. [DOI] [Google Scholar]

[B11-entropy-26-00244] 11.Wang Q., Sun H., Skoglund M. The Capacity of Private Information Retrieval with Eavesdroppers. IEEE Trans. Inf. Theory. 2019;65:3198–3214. doi: 10.1109/TIT.2018.2884891. [DOI] [Google Scholar]

[B12-entropy-26-00244] 12.Kumar S., Lin H.-Y., Rosnes E., Amat A.G.i. Achieving Maximum Distance Separable Private Information Retrieval Capacity with Linear Codes. IEEE Trans. Inf. Theory. 2019;65:4243–4273. doi: 10.1109/TIT.2019.2900313. [DOI] [Google Scholar]

[B13-entropy-26-00244] 13.Sun H., Jafar S.A. The Capacity of Symmetric Private Information Retrieval. IEEE Trans. Inf. Theory. 2019;65:322–329. doi: 10.1109/TIT.2018.2848977. [DOI] [Google Scholar]

[B14-entropy-26-00244] 14.Woolsey N., Chen R., Ji M. Uncoded Placement with Linear Sub-Messages for Private Information Retrieval from Storage Constrained Databases. IEEE Trans. Commun. 2020;68:6039–6053. doi: 10.1109/TCOMM.2020.3010988. [DOI] [Google Scholar]

[B15-entropy-26-00244] 15.Fanti G., Ramchandran K. Efficient Private Information Retrieval over Unsynchronized Databases. IEEE J. Sel. Top. Signal Process. 2015;9:1229–1239. doi: 10.1109/JSTSP.2015.2432740. [DOI] [Google Scholar]

[B16-entropy-26-00244] 16.Samy I., Attia M., Tandon R., Lazos L. Asymmetric Leaky Private Information Retrieval. IEEE Trans. Inf. Theory. 2021;67:5352–5369. doi: 10.1109/TIT.2021.3085363. [DOI] [Google Scholar]

[B17-entropy-26-00244] 17.Guo T., Zhou R., Tian C. On the Information Leakage in Private Information Retrieval Systems. IEEE Trans. Inf. Forensics Secur. 2020;15:2999–3012. doi: 10.1109/TIFS.2020.2981282. [DOI] [Google Scholar]

[B18-entropy-26-00244] 18.Liebowitz D., Nepal S., Moore K., Christopher C., Kanhere S., Nguyen D., Timmer R., Longland M., Rathakumar K. Deception for Cyber Defence: Challenges and Opportunities; Proceedings of the TPS-ISA; Atlanta, GA, USA. 13–15 December 2021. [Google Scholar]

[B19-entropy-26-00244] 19.Yarali A., Sahawneh F. Deception: Technologies and Strategy for Cybersecurity; Proceedings of the SmartCloud; Tokyo, Japan. 10–12 December 2019. [Google Scholar]

[B20-entropy-26-00244] 20.Faveri C., Moreira A. Designing Adaptive Deception Strategies; Proceedings of the QRS-C; Vienna, Austria. 1–3 August 2016. [Google Scholar]

[B21-entropy-26-00244] 21.Tounsi W. Cyber Deception, the Ultimate Piece of a Defensive Strategy—Proof of Concept; Proceedings of the CSNet; Rio de Janeiro, Brazil. 24–26 October 2022. [Google Scholar]

[B22-entropy-26-00244] 22.Sarr A., Anwar A., Kamhoua C., Leslie N., Acosta J. Software Diversity for Cyber Deception; Proceedings of the IEEE Globecom; Taipei, Taiwan. 7–11 December 2020. [Google Scholar]

[B23-entropy-26-00244] 23.Samy I., Tandon R., Lazos L. On the Capacity of Leaky Private Information Retrieval; Proceedings of the IEEE ISIT; Paris, France. 7–12 July 2019. [Google Scholar]

[B24-entropy-26-00244] 24.Vithana S., Banawan K., Ulukus S. Semantic Private Information Retrieval. IEEE Trans. Inf. Theory. 2022;68:2635–2652. doi: 10.1109/TIT.2021.3136583. [DOI] [Google Scholar]

PERMALINK

Deceptive Information Retrieval

Sajani Vithana

Sennur Ulukus

Roles

Abstract

1. Introduction

Figure 1.

2. Problem Formulation and System Model

3. Main Result

Theorem 1.

Remark 1.

Remark 2.

Remark 3.

Remark 4.

Figure 2.

Figure 3.

4. DIR Scheme

Definition 1

Definition 2

Remark 5.

4.1. Example 1: Two Databases and Two Files, N=K=2

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

4.2. Example 2: Three Databases and Three Files, N=K=3

Table 6.

Table 7.

Table 8.

Table 9.

Table 10.

Table 11.

Table 12.

Table 13.

4.3. Generalized DIR Scheme for Arbitrary N and K

Table 14.

Lemma 1.

5. Discussion and Conclusions

Appendix A. Proof of Lemma 1

Table A1.

Table A2.

Table A3.

Author Contributions

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Funding Statement

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4.1. Example 1: Two Databases and Two Files, $N = K = 2$

4.2. Example 2: Three Databases and Three Files, $N = K = 3$