Abstract
The infection rate of COVID-19 and the rapid mutation ability of the virus has forced governments and health authorities to adopt lockdowns, increased testing, and contact tracing to reduce the virus’s spread. Digital contact tracing has become a supplement to the traditional manual contact tracing process. However, although several digital contact tracing apps are proposed and deployed, these have not been widely adopted due to apprehensions surrounding privacy and security. In this paper, we present a blockchain-based privacy-preserving contact tracing protocol,“Did I Meet You” (DIMY). The protocol provides full-lifecycle data privacy protection on the devices as well as the back-end servers to address most of the privacy concerns associated with existing protocols. We have employed Bloom filters to provide efficient privacy-preserving storage and have used the Diffie–Hellman key exchange for secret sharing among the participants. We show that DIMY provides resilience against many well-known attacks while introducing negligible overheads. DIMY’s footprint on the storage space of clients’ devices and back-end servers is also significantly lower than other similar state-of-the-art apps.
Keywords: COVID-19, Contact tracing, Bloom filter, Blockchain, Privacy, Security
1. Introduction
The outbreak of the COVID-19 pandemic has changed many aspects of people’s daily lives. One of the characteristics of COVID-19 is its airborne transmission, which makes it highly contagious. Moreover, a person infected with COVID-19 can be asymptomatic, thus spreading the virus without showing any symptoms. Anyone who comes into close contact1 with an infected person is at an elevated risk of contracting the coronavirus. The hike in number of infected cases due to recent strains such as the Delta variant has led governments to speed up the vaccination rates and enforce lockdowns, quarantines and recommend social distancing, aiming to prevent the spread of COVID-19. However, despite these precautionary measures, the rate of spread of COVID-19 is putting the public health systems of many countries under strain.
Contact tracing is an established case investigation technique that has proven successful in dealing with other outbreaks such as Ebola and SARS. Contact tracing aims to establish the close contacts of an infected person so that they may be tested/isolated to break the chain of infection. Traditionally, the contact tracing process is performed manually in a reactive manner, triggered when a person tests positive for the virus. This is achieved by conducting a face-to-face interview to establish contacts made by the person while infectious.2 This approach, although useful, has some limitations: (i) It requires a large, trained workforce to cope with the caseload. (ii) It is hard for people to remember everyone they have met while infected in the last 2–3 weeks. (iii) A person may have met people that are strangers. Proactive contact tracing (Bay et al., 2020, Rivest et al., 2020, Apple, 2020, Google, 2020) has recently been proposed to mitigate these issues by maintaining a record of all close contacts made by a person and utilising these records if that person tests positive.
One way of implementing proactive contact tracing is to mandate record-keeping of people’s attendance at public venues such as offices, restaurants, etc. This can be done manually, for example, through QR codes that direct attendees to register their details. However, this increases the risk to individuals’ data privacy and allows for tracking the user’s behaviour. A more popular option is to employ smartphone-based digital contact tracing apps that can exchange Bluetooth Low Energy (BLE) messages with each other to record this contact. The digital contact tracing app is typically composed of two main entities, the smartphones acting as clients and a back-end server. In this model, the smartphones of two individuals with tracing apps installed would exchange some random identification code (this identification code does not reveal any sensitive information about their actual identities) when they were in close proximity. The backend is typically maintained by health organisations (or the government), and once a person is diagnosed with COVID-19, they can opt to share the local list of contacts stored on their smartphone with the back-end server to identify at-risk users.
The popularity of digital contact tracing apps can be gauged by the fact that more than 45 such apps have been proposed or are being used in different countries (O’Neill et al., 2020). These COVID-19 digital contact tracing apps are based on different architectures distinguishable in several aspects, including anonymous ID generation and exchange, risk analysis and notifications, etc. For details, readers are referred to a recent survey on digital contact tracing apps by Ahmed et al. (2020), in which the architectures are classified as centralised, decentralised and hybrid, according to the distribution of functionalities among the clients and the back-end server.
However, recent security and privacy analysis of these apps has revealed several risks and issues (Ahmed et al., 2020, Vaudenay, 2020b) and (Vaudenay, 2020a). These apps operate on different trust models. Apps based on the centralised architecture (such as OpenTrace, 2020, COVIDSafe, 2020, etc.) generally collect sensitive data at a trusted server and only provide the privacy protection against the malicious users. This trust model makes these apps vulnerable to server-side breaches and malicious actions by the server. On the other hand, apps based on decentralised and hybrid architectures assume an honest-but-curious server model whereby the server will try to harvest sensitive information, if available. Apps such as Troncoso et al. (2020) and Rivest et al. (2020) that are based on the decentralised architecture share the anonymous identifiers of the positive cases with all users for matching, making these apps vulnerable to linkage attacks, whereby malicious users can discover the real identities of persons diagnosed with COVID-19 (Vaudenay, 2020b). Apps based on hybrid systems perform the risk analysis and notification process at the server instead of revealing the anonymous IDs of positive cases to other users for matching, as proposed in the decentralised architecture. However, these apps suffer from high communication and processing costs. For example, the DESIRE protocol (Castelluccia et al., 2020) uses three BLE messages to advertise a single anonymous ID from a device (Troncoso et al., 2020), while the ContraCorona app (Beskorovajnov et al., 2020) employs three non-colluding servers (submission, matching and notification servers) to manage the contact tracing process.
This paper proposes a new hybrid privacy-preserving digital contact tracing protocol called “Did I Meet You” (DIMY). This protocol is designed to provide full life cycle data privacy protection that prevents contact tracing data from being used arbitrarily. We take a holistic view of the security, privacy, and operational requirements for digital contact tracing (details appear in Table 3 and Section 6). We employ Elliptic Curve Diffie–Hellman key exchange, Shamir secret sharing, Bloom filters and the blockchain technologies in an integrated manner to address most of the concerns associated with existing contact tracing protocols.
We make the following specific contributions:
-
•
We take a security by design approach and provide location privacy, the confidentiality of sensitive information (health, personal and social contacts), and protection against common security attacks on the system. This is achieved by taking a systems-oriented approach to employ a combination of techniques such as ephemeral identifiers, Diffie–Hellman key exchange, Shamir secret sharing mechanism, Bloom filters and the blockchain.
-
•
We propose the use of Bloom Filters to store close contact information at the individual device level as well as the backend. The use of Bloom filters serves three important purposes: (i) It prevents information leakage not only at the client level (for example, because of device theft or coercion attacks) but also from authorities operating the backend and governments that can issue subpoenas to access information stored on the backend. (ii) It considerably reduces client device and backend storage requirements. (iii) Bloom filter storage hides the details of the encounter to attenuate the privacy threat at the backend.
-
•
As opposed to traditional apps that employ centralised servers at the backend, we have used a blockchain-based backend design in the system. This provides scalability, transparency and trust on backend operations, and ensures the integrity, non-repudiation and verification capabilities for the management of the data uploaded from positively identified cases once appended as blockchain transactions. Although, it is well known that Blockchain suffers from performance overheads as compared with a centralised solution, the properties of a blockchain-based solution are necessary for the design of a comprehensive privacy-preserving contact tracing protocol. We also evaluate the performance of our implementation based on the Hyperledger and show that DIMY provides low latency and consumes less resources while supporting high throughput under moderate loads.
-
•
We consider a comprehensive threat model and provide safeguards against several types of adversaries, including malicious users, external actors, backend (administrators) and government (discussed in detail in Section 6.1). We also provide comprehensive security and privacy analysis of our proposed solution and show that DIMY provides resilience against common attacks such as linkage, enumeration, social graph construction and replay (discussed in detail in Section 6.3).
This paper is organised as follows. We discuss related work in Section 2. Section 3 introduces the background information necessary to understand the building blocks of our proposed solution. We detail the design of our DIMY protocol in Section 4. In Section 5, we compare the salient features of the DIMY with other existing protocols. Section 6 provides a security and privacy analysis of DIMY protocol, while Section 7 details the performance analysis of our proposed solution. Section 8 concludes this paper.
2. Related work
There have been a number of digital contact tracing apps proposed, developed and deployed. The MIT Technology Review summarised the salient features of 47 such apps (O’Neill et al., 2020). These apps follow different development approaches and address multiple aspects in terms of privacy, security, performance, reliability, etc. In an earlier work (Ahmed et al., 2020), we classified these tracing apps in centralised, decentralised and hybrid categories according to the underlying application architecture and the functionalities delegated to client devices and the server.
2.1. Centralised approach
BlueTrace protocol (Bay et al., 2020) is one of the first proposed digital contact tracing protocols that is based on a centralised architecture. This protocol was used in the Singaporean TraceTogether (OpenTrace, 2020) and the Australian CovidSafe (COVIDSafe, 2020) apps. Robert (2020) is Another protocol that is based on the centralised architecture named was proposed by Inria and Fraunhofer AIESEC that is also based on the centralised architecture.
In the centralised architecture, a central server is responsible for handling major tasks such as ID generation, risk analysis and notification, etc. Typically in this architecture, a user enrols with the central authority, which periodically (typically every 10–15 min) generates a unique temporary ID for each client. This temporary ID is sent to the user and is used in his/her advertisement messages. The user records the received temporary IDs locally when in the proximity of other contacts running the same app. If a user gets diagnosed with COVID-19, a health officer authorises the user to upload (share) the list of all captured IDs to the centralised server for risk analysis and notification of the close contacts.
The central server in the BlueTrace protocol can access the personally identifiable information collected at the registration stage and map each client to their temporary IDs. This raises privacy issues as this sensitive data can be used for other purposes besides digital contact tracing. In contrast to BlueTrace, the ROBERT protocol does not store any user identifiable information on the server. Temporary IDs are still created at the server without been linked with the devices used by the clients. ROBERTS’s notification process requires the uploading of IDs used by a device to check whether they have come in contact with a COVID-19 positive case or not. This is in contrast with BlueTrace, where the server can identify at-risk users and contact them proactively.
ROBERT protocol, however, similar to other protocols based on centralised architectures, has a high potential to function creep, in which it can be re-purposed into a mass surveillance system (Troncoso et al., 2020). Another potential issue associated with centralised architectures is the construction of partial social graphs (discussed in detail in Section 6) that enable linkability of infected cases and their contacts. A server breach can also result in the loss of sensitive data stored at the server.
2.2. Decentralised approach
The decentralised architecture differs from the centralised version by pushing more functionalities to the user’s devices. There is still a server involved, however, the role played by the server is more in terms of orchestrating the clients. This approach claims to improve user privacy by generating temporary IDs in the user’s devices. Additionally, exposure risk processing is also performed at the device level.
Generally, devices generate random seeds for forming their temporary IDs. These IDs are exchanged with other users who they come in contact with. Once a user is diagnosed positive with COVID-19, all the device’s seeds (some of the apps upload IDs instead of seeds) are uploaded to the server. Any user who wishes to check whether they are at-risk can download the seeds (or IDs) uploaded by the diagnosed users. The device can then perform matching locally, with the user notified of the result. The server is neither involved in the ID generation nor the at-risk analysis and notification process.
There are a number of protocols that follow the decentralised architecture, such as DP-3T (Troncoso et al., 2020), PACT-East Coast (Rivest et al., 2020), Google Apple Encounter Notification (GAEN) (Apple, 2020, Google, 2020) and TCN (2020). They have minor differences in the implementation of sub-components with the basic design following the general functionality described in this section.
Apps based on decentralised architectures provide enhanced privacy protection against server-based attacks as devices generate their own anonymous IDs. However, decentralised apps are known to be vulnerable to linkability attacks, whereby a user who has received the IDs generated by an infected user can link the IDs with the real user’s identity (Vaudenay, 2020b). These apps are also subject to enumeration attacks, enabling counting all positive cases by each user.
2.3. Hybrid approach
Hybrid architectures balance the tasks between the client and the server. The server is responsible for performing the risk analysis and notification process, while the client manages the generation of temporary IDs. Desire (Castelluccia et al., 2020) is one of the example protocols that follow the hybrid architecture. Devices using the hybrid protocol cryptographically generate and exchange IDs with other devices. A contact between two devices is represented by a unique encounter ID, which the app generates by combining its own and the received temporal IDs. A user who tests positive can optionally upload the generated encounter IDs to the server. Any user who wants to check the risk of exposure sends their collected encounter IDs to the server for matching. The server performs risk analysis and notifies any user who is deemed to be at risk.
The Desire protocol uses 256-bit IDs that are broadcast for generating encounter IDs (or tokens) using the Diffie–Hellman key exchange. This design choice requires at least three BLE message exchanges (Advertisement, Scan Request and Scan Response), resulting in an increase in energy consumption for devices (The DP3T Consortium, 2020).
2.4. Discussion
We have listed the modalities involved in the design of the three types of architectures commonly used for digital contact tracing. We have also highlighted some common issues related to privacy and security associated with apps based on these architectures. Our proposed solution, DIMY, can be broadly classified as a hybrid architecture that generates the IDs on the devices and performs risk-analysis and notification tasks at the server.
For DIMY, we utilise Bloom filters to encode the encounter ID generated by the devices and to store the encounters at the backend. The DP-3T protocol also suggests the use of Cuckoo filters but in a different context. In their proposal, the server has access to all the seeds uploaded by users who have tested positive and can generate the IDs used by positive cases. They proposed encoding these IDs in a Cuckoo filter to hide them from other users who are performing local risk-analysis. In comparison, our use of Bloom filters provides better privacy protection as these are used to hide the encounter information both at the device level and at the backend.
We also employ blockchain technology to manage the back-end processing. BeepTrace (Xu et al., 2021) is another framework that has proposed using two blockchains: ‘tracing chain’ to manage tracing/contact matching with anonymised user data, and the other ‘notification chain’ to manage notifications at the backend. In contrast, we propose using a single blockchain (Hyperledger Fabric) and enhancing its privacy protection further by using Bloom filter-encoded data storage. Additionally, we rely on the smart contract functionality to perform the exposure risk-analysis and matching in a privacy-preserving manner. Lv et al. (2020) propose Bychain, a new blockchain that stores contact information securely. Bychain protects user identities using Zero-Knowledge protocols. Though the contact information is stored on the blockchain, the authors do not discuss how data is retrieved and used when individuals test positive for COVID-19. The proposed protocols rely on support from GPS-equipped provers and witnesses to record contact information using LTE, WiFi, and BLE. Our proposed design, in contrast, is an end-to-end BLE based solution for contact tracing relying on widely available BLE modules on smartphones.
In the context of digital contact tracing, we also note that the use of cryptographically generated IDs and the Diffie–Hellman key exchange mechanism has been proposed in Beskorovajnov et al., 2020, Avitabile et al., 2021 and Castelluccia et al. (2020). Similarly, the -out-of- secret sharing mechanism for ID distribution has been proposed as an extension to the standard protocols in Troncoso et al. (2020) and Beskorovajnov et al. (2020). In our proposed protocol, the secret sharing mechanism is coupled with the Diffie–Hellman key exchange. We integrate these security and privacy-preserving techniques with efficient set membership using Bloom filters and additionally employ blockchain technology at the backend. Table 1 highlights the key technologies used in DIMY and places our proposal in the context of existing protocols.
Table 1.
Comparison of key technologies (C Centralised D Decentralised H Hybrid). A denotes an extension to the base protocol.
Protocols | Key technologies |
||||
---|---|---|---|---|---|
DH | Secret sharing | BF | BC | Architecture | |
BlueTrace (Bay et al., 2020) | × | × | × | × | C |
CovidSafe (COVIDSafe, 2020) | × | × | × | × | C |
ROBERT (Robert, 2020) | × | × | × | × | C |
DP-3T (Troncoso et al., 2020) | × | ✓ | × | D | |
PACT-East coast (Rivest et al., 2020) | × | × | × | × | D |
GAEN (Apple, 2020, Google, 2020) | × | × | × | × | D |
Desire (Castelluccia et al., 2020) | ✓ | × | × | × | H |
Contra corona (Beskorovajnov et al., 2020) | ✓ | × | × | H | |
BeepTrace (Xu et al., 2021) | × | × | × | ✓ | H |
ByChain (Lv et al., 2020) | × | × | × | ✓ | H |
DIMY | ✓ | ✓ | ✓ | ✓ | H |
3. Background information
This section introduces key technologies that form the building blocks of our proposed solution, including Diffie–Hellman key exchange, Shamir secret sharing, Bloom filters, and blockchain.
3.1. Diffie Hellman key exchange
Diffie–Hellman (DH) (Diffie and Hellman, 1976) is a public key distribution system that addresses secret key distribution over an insecure channel. It enables two users to communicate with each other to arrive at a common symmetric secret key to encrypt/decrypt their future communications. This secret key is computed in such a manner that an eavesdropper cannot reconstruct the shared secret key, in a computationally feasible context, even if they have heard all the messages exchanged.
This key distribution mechanism is based on the discrete logarithm problem. Let be a multiplicative group of prime order . Let be a generator. Party chooses , computes and sends to party . chooses , computes and sends to party . On receiving , A computes , similarly, on receiving , B computes . Due to the hardness of the discrete logarithm problem, an adversary cannot compute , given . Hence, it cannot construct the common key. In our contact tracing protocol, is an elliptic curve group.
3.2. Shamir secret sharing
In our proposed protocol, we use a secret sharing scheme (Shamir, 1979) to provide information privacy and secure communication between the devices participating in contact sharing. The basic idea revolves around making shares of a secret that can be securely distributed over many devices by a threshold secret sharing mechanism.
A secret sharing scheme consists of two phases, called sharing and reconstruction. In a -out-of- secret sharing scheme (also referred to as -secret sharing scheme), there is a unique player called the dealer who wants to share parts of secret among players, . The dealer, assumed as honest, creates shares of the secret () and sends every player a share (say to player ) of the secret in a way that any group of or more players can reconstruct the secret. All shares are necessary for the reconstruction of the secret if we keep .
A -out-of- secret sharing scheme, in general, has to satisfy the following two properties:
-
1.
Recoverability: The secret can be reconstructed given any shares.
-
2.
Secrecy: No information can be known about the secret given any number of shares .
3.3. Bloom filter
We employ Bloom Filter (BF) for logging contact information on the devices and at the back-end blockchain. A Bloom filter (Bloom, 1970) is a probabilistic data structure used to represent set membership. It supports an efficient mechanism for set membership queries. When queried, the BF will return true (with a false positive) if the queried data exists in the filter. A BF is implemented as a bit array , i , of bits accessed via independent hash functions , each of which maps an element in a set of elements to one of the bits within the bit array. Querying the presence/membership of an element in the set using a BF requires checking (i.e., returns 1 only if all corresponding bits are set to 1).
In BFs, false-positives (FP) are possible, but false-negatives (FN) are not. A FP is the probability that a membership test performed for an element not stored in returns 1, in which the parameter specifies the size of the bit array (Bloom filter length), specifies the number of hash functions, and is the cardinality of the stored set. Even if an exact expression for is available (Mitzenmacher and Upfal, 2005), virtually all work in the field relies on a simple, but tight, approximation:
(1) |
Simply, an FP is due to the collision of two different elements being mapped to the same bit position.
3.4. Blockchain
Blockchain technology was initially introduced in 2008 to maintain a public ledger of Bitcoin transactions (Nakamoto, 2019). This technology allows network participants to create chronologically sequential immutable blocks ensuring integrity, trust and transparency and providing digital evidence (Tian et al., 2019). It enables creating solutions that do not rely on a central authority; rather, the chain is spread over several nodes in a distributed manner. It also ensures information integrity by linking the blocks in the chain by hashing the previous blocks.
There are three main types of blockchain; public, private and permissioned. The public instances are blockchains that allow any peer to participate in the network. Some examples are Bitcoin (Nakamoto, 2019) and Ethereum (Ethereum, 2020). Private blockchains are restricted networks that allow only some nodes to participate while relying on a central authority to manage the nodes. For example, Ethereum can also run in a private instance, however, while executed privately, there is no connection/interaction with the public instance. In the permissioned blockchain, a group of participants perform the node access control. The main example is Hyperledger, in which organisations are responsible for managing the network. The Hyperledger Fabric (Hyperledger, 2020) blockchain instance supports the deployment of chaincode, a small piece of source code developed and embedded in the blockchain. This code is executed once a node sends a transaction to its address. It uses a consensus based on the Byzantine general’s problem, known as Redundant Byzantine Fault Tolerance (RBFT), for generating an agreement on the order and correctness of the set of transactions in a block. RBFT defines a leader to conduct an election with the existing nodes connected to a given organisation, providing speedy agreement. The validation of the transactions takes place in the smart contract. This consensus protocol makes the Hyperledger Fabric blockchain a good choice to support our solution, where the organisations are modelled as health authorities (see Section 4). Using this blockchain technology in our solution ensures the data integrity, transparency of operations and decentralised data storage.
4. DIMY protocol description
We first provide an overview of the DIMY protocol with a detailed description of the building blocks appearing in subsequent sections. Fig. 1 shows the overall architecture of our proposed solution. Consistent with other decentralised and hybrid architecture-based contact tracing approaches, devices participating in DIMY periodically generate random ephemeral identifiers. These identifiers are used in the Diffie–Hellman key exchange (Refer to Section 3.1) to establish a secret key representing the encounter between two devices that come in contact with each other. For example, Alice generates a random number at time and calculates its ephemeral identifier ( is a generator and G is an elliptic curve group of order p). After generating their , devices employ the -out-of- secret sharing scheme to produce secret shares of the . Devices now broadcast these secret shares, at the rate of one share per minute, through BLE advertisement messages. A device can reconstruct the advertised from another device if it has stayed in the communication range of this device for at least minutes, enabling it to collect secret shares of . Assume that Alice is able to reconstruct the advertised by Bob where is a random number generated by Bob at time . Alice now computes the secret encounter identifier . Bob also computes the same encounter identifier having received advertisements from Alice.
Fig. 1.
DIMY System Architecture.
A novel aspect of our proposed solution is the use of Bloom filters for storing contact information. Each device maintains a Daily Bloom Filter (DBF) and inserts all the constructed encounter identifiers in the DBF created for that day. The encounter identifier is deleted as soon as it has been inserted in the Bloom filter. Devices maintain DBF on a 21 days rotation basis, identified as the incubation period for COVID-19. DBFs older than 21 days automatically get deleted.
Our solution employs blockchain at the backend. Once a user is diagnosed with COVID-19, they can volunteer to upload their encounter information to the blockchain. Health Authorities (HA) then generate an authorisation access token from the blockchain that is passed on to the device. The user’s device combines 21 DBF into one Contact Bloom Filter (CBF) and uploads this filter to the blockchain. The blockchain stores the uploaded CBF as a transaction inside a block (in-chain storage) and appends the block to the chain.
A user who wants to check whether they have come in close contact with any user who was diagnosed positive can query the blockchain on a daily basis. A device combines all of the locally stored DBFs (the maximum number is limited to 21) in a single Bloom filter called the Query Bloom Filter (QBF). The QBF is part of the query that gets uploaded to the blockchain. The blockchain matches the QBF with CBF stored as a transaction in the blockchain and returns “matched” or “not matched” as a response. If the response from the blockchain is negative, the device deletes its QBF. Conversely, if the user is at-risk, the QBF is stored separately for further verification by HA in the contact tracing process.
All communications between the HAs and the blockchain is assumed to be secure and authenticated using standard techniques such as digital certificates and TLS. Communications between the client’s devices and the backend are also encrypted. As an extension to our base protocol, we also assume an anonymous communication channel (through the use of proxies or anonymous networks) (Beskorovajnov et al., 2020) between the users and the backend in an attempt to hide the linking of IP addresses with the real user’s identities.
We now explain each component of the proposed solution in more details.
4.1. Close contact representation
In this section, we briefly discuss the notion of encounter representation in the context of contact tracing apps. One simple way to achieve contact representation involves using device IDs. In this scheme, which we refer to as an ID-based scheme, each device is assigned a temporal ID, either by a central authority server or computed locally at the device. The devices advertise and exchange these IDs. The presence of an ID in the local storage thus represents an encounter with that user (device). In an alternate scheme that we refer to as the shared secret-based scheme, encounters can be represented by a shared secret between two participants. Both participants exchange specific messages to arrive at a shared secret only known to the parties in communication.
In ID-based schemes, all devices in the vicinity of the device A store the same ID advertised by A. In contrast, in the shared secret-based scheme, each device pair computes a different shared secret among them. Concretely, if three devices A, B and C meet each other and advertise and respectively, according to the ID-based scheme, A would store: , B will store: and C will store: . If these devices are instead using the shared secret-based scheme, they will end up storing secrets as A: , B: and C: . We have used the shared secret-based representation for recording the encounter between neighbouring devices as these provide more resilience against replay attacks discussed in Section 6.
4.2. Generating identifiers
This component pertains to generating anonymous device IDs that are used as device advertisements. We consider two common design options: (i) Each device generates its pseudo-anonymous random identifier. This is the approach taken by most decentralised and hybrid contact tracing apps such as PACT-East Coast, DP-3T and GAEN. (ii) A centralised server generates these identifiers for the registered devices that are then periodically transferred to the devices. This approach is used in apps based on a centralised architecture, such as TraceTogether and COVIDSafe (AU).
In our solution, each device generates its ephemeral IDs, which are valid for 30 min. This provides privacy protection against exposing a user’s contact details (mapping of IDs to real identities) to the backend. We have kept the size of as 16 Bytes (128 bits), as BLE advertisement messages can only carry a limited payload of data.3
4.3. Advertising and receiving identifiers
Once devices generate the , the advertisement phase can commence. For this phase, instead of directly advertising the , we use a -out-of- secret sharing (Shamir’s secret sharing) (Shamir, 1979) mechanism (explained in Section 3.2). The device calculates secret part of the and broadcasts each share at the rate of 1 share advertisement per minute. Fig. 2 illustrates the flow of information in the DH scheme used over an insecure BLE communication channel. A receiver can reconstruct the if it has successfully received any out of shares. For this work, we use the value of and as 15 and 30, based on the minimum duration of close contact defined as 15 min by CDC.
Fig. 2.
Information flow in DIMY.
There are multiple advantages of using this -out-of- secret sharing. First, the devices need to be in contact for at least minutes to receive at least parts of the secret. Setting k 15 min automatically takes care of the duration of close contact. Second, using secret sharing eliminates the possibility of replay attacks (discussed in Section 6.3). Finally, the combination of Diffie–Hellman key exchange and Shamir secret sharing provides extra protection against an adversary trying to capture the EphIDs for malicious use. A computationally bounded adversary, Eve, who is listening for BLE advertisements from Alice and Bob, has to first collect at least shares of Alice and Bob’s advertisements. Then, given Eve knows and , it can still not compute because of the hardness of the computational Diffie–Hellman problem (Boneh, 1998).
Additionally, we advertise part of the hash of the in each share. Fig. 3 shows the BLE advertisement message format with the 3-Byte hash of the ( included in the advertisement.4 The hash value helps in identification of shares belonging to the same , and as integrity check for the reconstructed . In our scheme, a device will simply discard the shares without attempting reconstruction if it has not received at least shares or if the hash values fail the integrity check.
Fig. 3.
BLE advertisement message format.
An additional issue associated with using k-out-of-n secret sharing is the address carryover mechanism due to the rotation of the after each Epoch (30 min).5 Suppose a receiver device B comes into contact with A when the 10th share of a particular is being broadcast by A at time and moves away when it has received the 10th share of the next generated by A at time . Device B has thus maintained contact for 20 min, however, the logging mechanism would fail to register this contact as it has only received 10 chunks each of two different .
To address this issue, we use the simultaneous advertisement of multiple with overlapping intervals as proposed in Beskorovajnov et al. (2020). A device always broadcast two , rotating one identifier in such a way that the start of each identifier is staggered by 15 advertisement intervals. In Fig. 4, a device is broadcasting two , alternately every 30 s. A receiver device which it comes into contact with at time and leaves at time is able to register this contact as it has received enough shares of while in contact.
Fig. 4.
BLE advertisements with random MACs and .
Fig. 3 shows the message format of the BLE advertisement messages employed in our solution. We have used the ADV_NON_CONN_IND message format for connectionless advertisements for the chunks of EphID.
4.4. Storing encounter information
After sufficient shares of are exchanged with neighbouring devices, each pair of devices can compute a secret symmetric encounter identifier (referred to as ). Each device inserts the encounter identifier in the local DBF and then discards it. We have used a Bloom filter to preserve the data privacy, reduce the storage requirement and improve the query efficiency compared with other normal data storage structures.
4.4.1. Design of bloom filter
The design of the filter involves various parameters such as the number of entries to be stored in the filter (), the size of the filter in bits (), the number of hashes () and the false positive rate ().
Fig. 5 shows the false-positive rates for increasing values of encounters inserted in the DBF and CBF with different values of and . Considering DIMY uses the secret-sharing mechanism (with at least 15 min to register a contact), we assume as 1000 for DBF as a worst-case representing the maximum number of close contacts per day. As the CBF can hold a maximum of 21 DBFs, the worst-case for CBF is 21 000. The FPR analysis shows that the worst FPR is given by combination of 50 kB-2 Hashes, and the best FPR by 100 kB-4 Hashes. As the hashing is performed at client devices, we take the size of the filter 100 kB with as 3 to reduce the computations and battery consumption. The DBF and CBF, in this setting, have FPR of 1 in 19 Million and 1 in 2303, respectively.
Fig. 5.
False positive rate vs number of encounters — DBF and CBF.
4.5. Uploading encounter identifiers to the blockchain
Once a user is diagnosed with COVID-19, they can get an authorisation code from the health authorities to upload their locally stored contact data to the back-end blockchain. Fig. 2 shows the information exchange (CBF and QBF to the backend and response from the backend). All communications between the HAs and the blockchain is assumed to be secure and authenticated using standard techniques such as digital certificates and TLS. Communications between client’s devices and the backend is also encrypted. As an extension to our base protocol, we also assume an anonymous communication channel (through use of proxies/anonymous networks) (Beskorovajnov et al., 2020) between the users and the backend in an attempt to hide the linking of IP addresses with the real user’s identities.
The device combines their DBF covering the last 21 days into a single CBF of size 100 kB (equal in size to the DBF). The set union function is utilised as the combination process for the DBFs to construct a CBF. For example, all ‘1’-bit existing information in the DBFs are accumulated into one CBF by performing a bit-wise OR merging (Papapetrou et al., 2010). This merged CBF is theoretically equivalent to performing and its false probability is similar to using a standard Bloom filter. The CBF is then sent to the back-end blockchain for logging as a transaction. The system supports querying by uploading the QBF (encoded with the DBF from the last 21 days). The user’s device uploads this query to the blockchain to check whether someone in close contact has tested positive.
DBF, CBF, and QBF are of the same size of 100 kB, serving three distinct purposes: First, it reduces the amount of data transferred to the backend, i.e., instead of uploading 21 100 kB-sized DBFs, we only use one 100 kB CBF or QBF for upload. Second, this aggregation of multiple DBFs into a single CBF and QBF hides the details about the day/time of encounter to attenuate the privacy threat at the backend. Third, equal-sized CBF and QBF are employed to support efficient Bloom filter matching through set intersection operation at the backend (explained further in Section 4.6).
4.5.1. Design of blockchain
We use Hyperledger Fabric (Androulaki et al., 2018) for the blockchain’s implementation, as it provides a modular permissioned blockchain platform, which allows flexibility in modelling the Bloom filter on transactions. The Hyperledger Fabric network is designed to be maintained by a consortium of Health Authorities (HAs), which comprises stakeholders in the healthcare sector, e.g., relevant government agencies and hospitals. Each HA maintains a set of peer nodes to host the ledgers, execute smart contracts, and maintain a set of orderer nodes for consensus protocol. HAs and their corresponding peers are identifiable by cryptographic primitives that comply with the X.509 standard for public-key certificates.
The HAs interact with the blockchain through multiple smart contracts. We have designed a smart contract that is capable of performing the following functionalities:
-
•
Issuing access tokens: A user who tests positive requires authorisation by HAs to upload their CBF to the blockchain. The HA transacts with the blockchain to issue a temporary access token by providing corresponding HA credentials to the smart contract. Upon successful credential validation, the smart contract records the token to the blockchain. HA provides this access token to the user, that is only valid for 24 h.
-
•
Processing CBF: The smart contract validates the temporary access token provided by the user who uploads their CBF. Upon successful validation, the smart contract records the CBF permanently to the blockchain and updates the ledger’s state.
-
•
Processing QBF: The smart contract handles queries from users concerning contacts with positive cases by checking the user’s QBF against stored CBF in the ledger. Then, the smart contract returns the matching result, which will either be matched or not matched.
CBFs stored at the blockchain are managed and queried by the Hyperledger. Given the on-chain data storage capacity of a single transaction is 4 MB (Gorenflo et al., 2019), the Hyperledger can add a minimum of 1 or a maximum of 40 CBFs (40 × 100 kB 4 MB) in a single transaction.
Interaction with the blockchain is provided only through REST APIs. The query mechanism does not require any access authentication. The REST APIs are provided by HAs, which imply that multiple identical APIs are available.
4.6. Contact verification process
The contact verification process is performed through a smart contract at the blockchain. Each day6 the app combines all the DBFs of the last 21 days (or the available DBFs if the app has been in operation for less than 21 days) into a single QBF of size 100 kB by performing a bit-wise OR operation on the DBFs (). The user also appends to the query the date of the oldest DBF, , that has been combined to form the QBF. This QBF is uploaded by the app to the backend blockchain for executing the query. The blockchain takes this query and runs a search through the stored blocks, trying to match any entry in the QBF with existing entries in stored CBF transactions. Note that the search is restricted to only transactions following the date.7
This search equates to finding the intersection of the two equal-sized Bloom filters CBF and QBF, by performing a bitwise-AND operation on CBF and QBF to approximate their set intersection. As we have used three hashes for constructing the DBFs, at least three bits must match in the CBF and QBF to indicate a possible close contact between the person who has uploaded the QBF and a positive case encoded in the CBF. Let denote the number of bits set in the intersection set, the backend returns “No match” if . For , a match is declared and returned to the app in the form of a warning. We can approximate the FPR for the intersection set by . Since is always less than or at most equal to the set bits in any CBF and QBF, the FPR for the intersection set is FPR for CBF, and is FPR for QBF (Papapetrou et al., 2010). Note that we have calculated the FPR for CBF and QBF to be 1 in 2303 for worst case analysis assuming 21 000 close contact entries (Section 4.4.1).
5. Comparison
We introduced three architectures commonly used for designing digital contact tracing apps in Section 2, and discussed the design of our proposed solution in the previous section. This section compares the salient features of our proposed solution with representative apps from the three architectures.
Table 2 highlights the salient features and their equivalent in selected apps. Our proposed solution, DIMY, generates a temporal ID on the client’s device in line with other decentralised and hybrid apps. DIMY is also optimised for storage, both on the client’s device as well as the backend. The design involves storing contact information in fixed-size DBFs. The back-end blockchain only stores a single Bloom filter (CBF, size 100 kB) per positive case that has encoded information of DBFs for up to the last 21 days. This reduces the storage requirement at the backend considerably compared with other apps listed in Table 2.
Table 2.
Comparison of DIMY with other protocols.
Salient featuresf | DIMY | Centralised (BlueTrace) | Decentralised | Hybrid (Desire) |
---|---|---|---|---|
(PACT-East, DP-3T) | ||||
ID generation | Client devices | Server | Client devices | Client devices |
Storage on devices | Encounter encoded in | Received IDs from close contacts | Received IDs (chirps) | EphIDs and two PETs tables |
Bloom filters | from close contacts | |||
Storage on server/backend | Encounter encoded in | Mapping of IDs, Complete list | Hourly seeds and | PETs for positive cases |
Bloom filters for positive cases | of contact IDs for positive cases | time for positive cases | ||
Processing on devices | ID generation, Diffie–Hellman | Minimal processing | Hourly seed and chirp | ID generation Diffie–Hellman exchange |
key generation, -out-of- secret | generation, Chirp matching | |||
sharing, Bloom filter encoding | and risk analysis | |||
Processing on server/backend | Blockchain matching | Risk analysis and | Minimal processing | Risk analysis PETs matching |
for at-risk users | ID matching | |||
Data upload | Bloom filter for positive cases | All contact IDs captured | Seeds, timing information | PETs table for positive case |
Query Bloom filter for other users | for a positive case | for a positive case | ||
Data download | Result (matched/not matched) | Periodic download of new IDs | Seeds, timing information | Result of risk analysis |
from blockchain | for all positive cases | |||
Risk analysis & notification | Performed on Blockchain | Performed on server | Performed on devices | Performed on server |
Another salient design option is where to perform the risk analysis and notification. Apps based on centralised and hybrid architectures perform this step at the centralised server, while apps based on a decentralised architecture perform this locally, on the device. DIMY performs the matching of contact information on the back-end blockchain. However, the blockchain cannot infer any extra information as the matching is performed on contact information encoded in Bloom filters.
Our proposed solution is device-centric because it performs most of the privacy-preserving operations on the device. This includes EphID generation, computing -out-of- shares and broadcasting these shares using BLE messages, and encoding received contact information on DBF after enough shares have been received to construct a shared secret. Desire also uses the Diffie–Hellman key exchange and the generation of local IDs on the devices. In contrast, Desire uploads the shared secrets collected by a user who has been diagnosed with COVID-19 to a server for matching, to be performed at a later stage. DIMY requires uploading the least amount of data when compared with other apps. A single 100 kB sized CBF is uploaded from a COVID positive client to the blockchain. This is in contrast with uploading all contact IDs required in apps that follow the centralised architecture, and uploading multiple seeds or the PETs table on apps that use decentralised and hybrid architectures. DIMY also requires client devices to upload QBL, a Bloom filter for matching transactions stored on the blockchain. DIMY client devices only download the risk analysis results in the form of a binary notification similar to centralised and hybrid apps. Apps based on the decentralised architecture involves the downloading of either seeds/chirps from the server in order for matching to be performed on the devices.
Table 3.
Requirements for contact tracing protocols.
Requirements | Properties | Details | How achieved in DIMY |
---|---|---|---|
Security | Minimise false | A user not being warned | Use of Bloom filter that provides guarantees against |
negatives. | despite being in close contact | false negatives during the matching process. | |
(Completeness) |
of an infected person. |
||
Minimise false | A user being warned | Use of Shamir secret sharing and Diffie–Hellman key | |
positives. | without a valid close contact | exchange to mitigate false positives due to replay | |
(Soundness) | with any infected person. | attacks. False positives are still possible with a low | |
probability due to relay attacks and Bloom filter matching. |
|||
Ensure system’s | Data maintained at the backend | Use of blockchain as the backend to provide integrity, | |
integrity and | is trustworthy and the | availability, and trust. | |
availability. | matching service accessible. | ||
Privacy | Confidentiality | Only the health authorities | Health authorities are involved only in the authorisation |
of health status. | can learn about the status | stage. Use of bloom filters and smart contracts ensures no | |
(infected or warned) |
of an infected person. |
one learns about close-contacts of an infected person. |
|
Privacy for meeting. | No entity can learn about | Use of Bloom filters to hide the time/date of contacts. | |
/contact history. |
the contact history of a user. |
The back-end server cannot construct a social graph. |
|
No one can link the anonymous | Use of Ephemeral identifiers and | ||
Hide user’s | IDs with real identities. Health | storage of contact information in Bloom filters. | |
identities. | authorities learn this when an | ||
infected or at-risk user contacts them. |
|||
Location privacy. | An adversary cannot track | No location information is captured by the system. | |
movement of a device. | Limited local device tracking is possible. | ||
Operational | Minimise | Reducing the amount of contact | Use of space efficient Bloom filters for storage at the |
storage costs. | tracing data stored on mobile devices | client’s devices as well as the backend. | |
as well as the backend. |
|||
Minimise | Reducing bandwidth utilisation | Use of BLE advertisement messages reduces number of | |
bandwidth usage. | directly helps in prolonging | messages exchanged between the devices. Uploads from | |
the battery life of mobile devices. |
client’s devices consist of short, fixed-size Bloom filters. |
||
Minimise | Computational cost directly affects | Contact matching and risk analysis process is only | |
computational cost. | battery consumption for devices. | performed at the backend. The cryptographic operations | |
such as DH key generation and exchange involves group | |||
exponentiation which are not as computation intensive. |
6. Security and privacy analysis
This section is dedicated to an analysis of security and privacy guarantees provided by the design of DIMY.
6.1. Threat model
In this section, we describe the adversaries considered in the design of DIMY protocol, their capabilities and the risks that they pose. We categorise the adversaries into four groups, users, external actors, back-end server (administrators), and the government. (i) Users have access to in-app information as well as passive information captured through eavesdropping. App users are also assumed to have access to the open-source app code. Furthermore, we assume users can only have access to data stored on other smartphones through theft or coercion. (ii) External actors are devices that are active on the Bluetooth network that are able to either passively eavesdrop on messages exchanged between app users or that can actively insert their own messages. (iii) Back-end server is assumed to be honest-but-curious (follows protocols honestly, is curious to know the data but does not alter the data), where the administrators have access to all data received and stored at the back-end server. (iv) The government can access any information stored on individual smartphones or the back-end server through subpoenas to investigate a group or individual user of the app. HAs have a special status in our threat model. They are assumed semi-trusted in that any app user can voluntarily contact them for upload authorisation, advise or manual contact tracing thus revealing their true identity. However, without this voluntary contact, HAs cannot learn anything about the contact tracing data captured by the system.
6.2. Security and privacy analysis
We list the requirements associated with digital contact tracing protocols in Table 3. We classify these requirements into three categories: security, privacy and operational. We also highlight which of these properties are achievable in DIMY. In this section, we provide additional details about these properties.
6.2.1. Minimise false negatives (completeness)
A QBF is sent to the back-end blockchain for matching with stored CBFs. Since the check is performed on Bloom filters by a smart contract run by peer nodes, it returns a match only if a match exists. This is ensured by the fact that false negatives are not possible in Bloom filters matching and the assumption that the smart contracts implementation is correct. We note that both CBF and QBF are sent through a secure channel. Thus DIMY satisfies the completeness property.
6.2.2. Minimise false positives (soundness)
A user who was not in close proximity with a COVID positive patient will receive a match (False positive) with low probability due to the following reasons. (i) Bloom filter matching process produces false positives (as computed by Eq. (1)). (ii) False positives due to replay/relay attacks. Our design does not permit replay attacks, however relay attacks are still possible due to inherent characteristics of radio-based communication. An attacker, who was not in proximity with a COVID positive patient, cannot construct an encounter identifier for a QBF that will match with the CBF. Moreover, an attacker needs to be in close proximity of a user to collect sufficient shares before trying to break the encounter identifier established through Diffie–Hellman process. The attacker will not be able to compute the encounter identifier because of the hardness of the computational DH problem.
6.2.3. System’s integrity and availability
In our proposed protocol, the blockchain is adopted at the backend, which provides transparency on the integrity, trustworthiness, and availability of the data stored on the chain. This is assuming correct implementation of smart contracts for the blockchain.
6.2.4. Confidentiality of health status
The blockchain stores only CBFs. A user who receives a match from the blockchain in response to the QBF, cannot be certain which COVID positive user they have met as the risk-analysis is performed at the backend without sharing the CBFs uploaded by infected users. There is additional protection as the match response could be a false positive due to probabilistic nature of Bloom filters. Here we ignore the case in which a person has been in contact with only one person during the last 21 days or uploads only one-entry QBF and receives a “match”. In such a case, the identity of the COVID positive patient is known. Similarly, identities of users who have been warned are also kept private assuming risk-analysis is performed by a correctly implemented smart contract without any human involvement, and use of secure communication channel between users and the blockchain.
6.2.5. Hiding contact history
Assuming a device is compromised. Since the EncID and EphIDs are not known, and the secrets corresponding to the EphIDs of the device are not known, the attacker cannot know the identity of users in close proximity with the device as all contacts are encoded in Bloom filters using one-way hashing. Bloom filters also provide an implicit privacy protection against possible back-end breaches significantly reducing the backend’s ability to construct social graphs based on the diagnosed user’s contacts. The only information the backend can infer is an estimate of the number of contacts encoded in the CBF uploaded by a positive case, without identifying who these contacts are. As users do not have direct access to the CBF stored in the blockchain, they are unable to extract any sensitive information.
6.2.6. Hiding user’s identities
The devices only broadcast shares of the anonymous identifiers via BLE advertisements. These are first used to construct encounter IDs that are then encoded in the Bloom filter, which stops underlying linkages from being created between these pseudo-identifiers and concrete IDs in the real world. This binary data encoded in a Bloom filter becomes semantically meaningless to any other user, and even the backend cannot associate the reported data with an infected person or any specific individual.
Moreover, all encounter information constructed through the Diffie–Hellman exchange is deleted once it is encoded in the DBF, hence protecting data in case a device gets physically stolen or the user is forced to reveal app data under coercion.
6.2.7. Location privacy
No location information is collected by the protocol. Thus, it is hard to extract any extra information that could assist a compromised backend to use the stored data for any other purpose. However, the DIMY protocol is also susceptible to privacy attacks launched by malicious users/external actors who may use a modified application to collect other contextual information regarding the contacts. Multiple malicious users may work together, combining their information, to collect a large number of recorded broadcasts with metadata on time and location, etc.
Another potential privacy concern around contact tracing apps is known as the function creep (Troncoso et al., 2020). Function creep refers to the evolution of the app to include functionalities other than the original ones, i.e., the app has the potential to be turned into an instrument of mass surveillance, violating human rights. For our proposed protocol, data storage in bloom filters and the use of blockchain provides trust in the intended use of the data.
The operational requirements listed in Table 3 have already been discussed in Section No 4.
6.3. Resilience against attacks
In this section, we will explore the resilience of our proposed design against common attacks launched against digital contact tracing apps.
6.3.1. Replay attacks
In this type of attack, the goal of an adversary (an external actor) is to inject malicious contact entries such that these result in false positives during the contact verification stage. An adversary can capture the advertised messages by a user’s device and later on replays these at another location. Our proposed solution provides a safeguard against such attacks by using the secret sharing scheme. An adversary must capture at least shares of a message, before taking these shares to another location for rebroadcasting. However, in order to be counted as false positives, the originator of the messages must also have matching entries in their logs. The only way this attack would be possible is if the adversary moves back and forth between two different locations, collecting shares and rebroadcasting these to ensure the existence of symmetric contact information.
6.3.2. Relay attacks
An adversary’s objective during a relay attack is the same as it is in a replay attack. An adversary can capture a user’s advertised shares and immediately relay the captured message at the same location, extending the range of the message.
Our proposed solution is susceptible to relay attacks that are inherent to all schemes using BLE messages. It is possible to rebroadcast shares such that two users, Alice and Bob, have symmetrical contact information even though they were not in direct contact with each other. We point out that the adversary has to be in direct communication range of both Alice and Bob for minutes to force the symmetric contact information. As a consequence, if either Alice or Bob tests positive, the other user would be informed of a ‘false positive’ close contact.
6.3.3. Device tracking
The adversary’s goal in this type of attack is to link the BLE information broadcasts with a device identifier to track a particular device. A passive actor can listen for BLE messages and transfer these to a central tracking server. The server can diffuse information from multiple tracking devices to estimate the position and movement pattern of the device being tracked. In most cases, the Bluetooth MAC address is randomised for a short period to limit this tracking. In our proposed solution, we use chunks of that are different from each other and use the hash of to link all these shares together. An adversary can use this hash in combination with the randomised MAC address to perform limited local tracking.
Table 4.
Possible attacks on digital contact tracing (C Centralised, D Decentralised, H Hybrid).
Attacks | DIMY | C (BlueTrace) | D (PACT-East, DP-3T) | H (Desire) |
---|---|---|---|---|
Replay | × | ✓ | ✓ | × |
Relay | ✓ | ✓ | ✓ | ✓ |
Device tracking | ✓ | ✓ | ✓ | ✓ |
Carryover | ✓ | ✓ | ✓ | × |
Location confirmation | × | ✓ | ✓ | × |
Enumeration | × | ✓ | × | × |
Denial of service | ✓ | ✓ | ✓ | ✓ |
Linkage | × | ✓ | ✓ | × |
Social graph | × | ✓ | ✓ | ✓ |
6.3.4. Location confirmation
This attack can be launched by an app user or an external actor with the goal to discover the presence of a user at a known location unless some location privacy preservation mechanism is employed (Tian et al., 2020). This is accomplished by linking contextual information, such as the mobile phone model advertised in some of the apps that are based on centralised architectures. This type of attack is not possible in our proposed protocol, due to the use of ephemeral identifiers and the suppression of other information that links the device with a particular user.
6.3.5. Enumeration attack
An adversary (external actor or malicious app user) may want to estimate the number of users who have uploaded their contact tracing data after testing positive with COVID-19. In our proposed protocol, the encounter data is first encoded in Bloom filters before being stored on the blockchain. A user is allowed to query the blockchain for matching any encounter record, without revealing the records stored on the blockchain. An adversary is thus unable to launch an enumeration attack. Note that as the HAs authorise all uploads of CBFs to the blockchain, and there are multiple HAs that exist in the system, they can collude with each other to arrive at the total number of COVID-19 cases that have uploaded CBFs to the blockchain.
6.3.6. Denial of service
In this type of attack, an adversary generates fake advertisements to consume the storage and battery resources of other devices. Digital contact tracing apps are prone to this attack irrespective of their underlying architecture. In our proposed protocol, a malicious user can force other devices to process BLE advertisements containing shares of fake .
6.3.7. Deanonymisation/linkage
The aim of this attack is to deanonymise a user based on the information collected either through the system or by using a side-channel. The system has to be resilient against this attack launched by any of the adversary described in Section 6.1. For malicious users and external actors, our system provides safeguard by not sharing the information regarding close contacts and positive cases with other users. Additionally, all encounter data is stored in Bloom filters at the devices as well as the backend. This provides protection from anyone who can get access (legal/illegal) to the devices or the backend.
6.3.8. Carryover attack
An address carryover attack is possible for contact tracing apps when the change over time of a randomised Bluetooth MAC address and the temporary identifier are not synchronised. A listener can thus easily link the multiple Bluetooth MAC addresses advertised within the same identifier’s life time or vice versa. Our proposed solution relies on the simultaneous advertising of two identifiers to enable the correct contact information to be captured (discussed in Section 4.3). This mechanism may result in a carryover attack for tracking purposes, whereby an adversary can associate multiple advertised identifiers with the BT MAC addresses used by a device. An adversary can associate the hash that is being advertised along with random MAC and chunks of the raw to track a user’s device, as long as that device is within the communication range.
6.3.9. Social graph analysis
Social graph construction enables the identification of a person’s close contacts. This is an imperative part of manual contact tracing in which a health official conducts interview with a positive case to identify their at-risk close contacts. In our proposed solution, we have employed two mechanisms that prevent the construction of social graphs. First, we generate ephemeral IDs on the devices as opposed to by the server. This means that the backend cannot link an with a user. Second, we have employed Bloom filters to hide the contact information of positive cases from the backend. The backend is thus unable to construct a social graph without access to the contact history.
Table 4 summarises this section with details of the attacks that could be launched against various architectures, including against our proposed design.
7. Performance evaluation
In this section, we present a quantitative evaluation of our proposed blockchain-based backend solution, in terms of throughput, latency and resource consumption. This section is divided in to two parts. We first describe the proof of concept implementation using a local back-end server hosting the blockchain. We note that for these experiments, we generated synthetic data on the device level and supplied this to the blockchain. In the second part, we elaborate on a real-world deployment scenario, whereby we implemented the front-end apps on both Android and iOS, and hosted the blockchain on the AWS cloud (Ahmed et al., 2021).
7.1. Proof-of-concept implementation
We implemented a proof-of-concept of our proposed framework using Hyperledger Fabric v2.0, as it allows flexibility in modelling Bloom filters in a permissioned blockchain environment. We opted to use a permissioned blockchain to control the app user’s access by regulatory organisations, such as the health authority. We consider the standard configuration of a solo orderer node with one communication channel as the consensus mechanism. Experiments have been conducted on a GPU-equipped server with 12 cores of CPU and 64 GB of memory, running Ubuntu Linux 18.04 LTS. We deployed different numbers of Hyperledger nodes as Docker containers on the server. We implemented the core functions of DIMY transactions as chaincodes written in the Go programming language by selecting native Go implementation of the Bloom Filter v2.0.38 and a non-cryptographic murmur hashing function for Go in the Hyperledger Fabric. We benchmarked our proof-of-concept implementation using Caliper v0.3.2,9 an official tool from the Hyperledger foundation that allows blockchain designers to measure the performance of the implementation of a specific blockchain. We measured the performance of our implementation in terms of throughput, latency, CPU and memory consumption, repeating the measurements for 30 s.
7.1.1. Throughput and latency for blockchain operations
In this experiment, we examined the throughput and latency of different DIMY blockchain transactions of uploading CBF, token issue and querying through QBF, when a load of 50 tx/s was sent to the blockchain. We define throughput as the rate at which transactions are successfully executed and latency as the time required to complete the transactions. Please note that although a complete processing cycle of our architecture includes the serial execution of multiple DIMY transactions, we examined the throughput and latency only for each individual transaction. For instance, we assume that the CBF is already uploaded to the blockchain and we only measure the throughput and latency for executing query QBF. This definition does not consider the network latency, which could be impacted by different external factors.
We plot the results in Fig. 6 with a constant transaction rate of 50 tx/s. Among the three DIMY blockchain transactions, QBF upload and matching show the lowest latency, while uploading CBF has the highest latency, at around 4700 ms. Additionally, issue tokens and upload CBF operations failed to achieve the throughput of 50 tx/s with peaks at about 45 and 39 transactions per second, respectively. The latency for uploading 50 simultaneous CBFs is the highest compared to other operations, as this operation involves transaction insertion and consensus mechanisms at the blockchain. QBF matching, on the other hand, performs very well in terms of latency and throughput. We note that upload CBF and issue token transactions are only performed once for each identified COVID-19 positive case. On the other hand, devices upload QBF for matching once in a 24-h cycle, making this the most frequently executed operation. The blockchain thus performs well to support low latency and high throughput for the upload QBF operation.
Fig. 6.
Comparison of the average latency and the throughput for blockchain operations in caliper, using a load of 50 transactions per second.
7.1.2. CPU and memory consumption for blockchain operations
We compare the CPU and memory consumption of different operations performed on the blockchain and show the results in Fig. 7. We captured CPU and memory usage of the backend employing a Caliper monitor when we applied a load of 50 tx/s to the Hyperledger Fabric network for 30 s. The results show that the query operation using QBF requires the most resources in terms of average memory consumption and CPU utilisation. Moreover, the CPU utilisation considerably varies between the three test operations. The upload QBF operation provides high throughput with low latency while requiring the most resources. This result highlights that resource requirement for QBF matching operation is a fundamental design factor for provisioning the backend, as QBF matching is the most used operation on the backend.
Fig. 7.
Comparison of CPU and memory consumption for the execution of querying QBF, issuing an access token and uploading CBF.
7.2. Latency of querying QBF with different sized bloom filters
The following experiment aims to examine the effect of using different sized Bloom filters on the resulting latency of QBF matching operations on the blockchain. We executed different transaction rate loads, namely 75, 100 and 125 tx/s, noting both maximum and the average latency for QBF sizes varied from 10 kB to 1000 kB. The results shown in Fig. 8 demonstrate that the average latency remains less than 100 ms for all QBF sizes less than 900 kB, while the maximum observed latency remains less than 100 ms for QBF sizes up to 100 kB. The maximum latency starts increasing considerably, especially for higher transaction rates, if the size of the QBF is increased beyond 100 kB. This result shows that our chosen value of 100 kB for Bloom filters is optimal to minimise the maximum observed latency for different transaction rates.
Fig. 8.
The maximum and average latency of querying QBF with differently sized Bloom filters.
7.2.1. Throughput and latency analysis
Lastly, we investigate the throughput and latency for querying QBF with different number of Hyperledger peer nodes (2, 4, and 8 nodes) and plot the results in Fig. 9. During this experiment, we started from 50 QBF tx/s and gradually increased the load to 500 tx/s. The blockchain was able to deliver increasing throughput in line with the transaction workload. Besides, maximum latency remains below 100 ms for transaction rate of up to 500 tx/s for all explored cases except the 2-node configuration that shows an increase in latency beyond the 250 tx/s. However, the average latency stays low (less than 50 ms) for all cases. We note that this observed performance is achieved with the hardware resources used for this proof of concept described in Section 7.1.
Fig. 9.
Throughput and latency of querying QBF with the size of 100 kB in different transaction send rates and number of Hyperledger peer nodes.
7.3. Real world deployment
Next, we implemented the DIMY protocol on both Android and iOS platforms and deployed the backend blockchain implementation on the AWS cloud. Fig. 10 shows the screenshots of the DIMY app running on an iOS device. For the backend, we instantiated a single AWS EC2 instance (t2.small with a single 2.4 GHz CPU and 2 GB RAM) and ran two HyperLedger nodes and one orderer node as Docker containers. Note that this AWS instance has considerably lower resources as compared to our local GPU based server. We ran the scalability analysis on the AWS setup measuring the throughput and the latency.
Fig. 10.
DIMY iOS app user interface.
Fig. 11 shows the throughput and average latency analysis for this setup when a load of 50 tx/s is applied for three different DIMY operations i.e., query QBF, token issue and upload CBF. The backend can provide a throughput that matches the applied load for both query QBF and token issue operations. However, the resource-constrained AWS instance could not cope with the 50 tx/s load for upload CBF operation resulting in very low throughput and high average latency.
Fig. 11.
Average latency and the throughput for blockchain operations in caliper, AWS backend using a load of 50 transactions per second.
Fig. 12 show the throughput and latency analysis for the query QBF operation when the rate of transactions is varied from 10 to 90 tx/s. The backend is able to match the throughput with the applied load up to 60 tx/s giving less than 100 ms latency. The average and maximum latencies increase with a drop in throughput for any further increase in applied load beyond 60 tx/s.
Fig. 12.
Throughput and latency for query QBF operations. AWS backend with varying number of transactions.
In summary, results from this small scale deployment of the DIMY protocol with the back-end blockchain hosted on AWS indicate the importance of adequate resource provisioning on the cloud to achieve effective performance especially for the upload CBF operation.
8. Conclusion
In this paper, we have presented the design and security and privacy evaluation for DIMY, a new privacy-preserving digital contact tracing protocol. Our protocol design integrates several privacy-preserving techniques, assuming a threat model that includes malicious users and an honest-but-curious backend. We employed Bloom filters to enhance the privacy protection as well as to considerably cut down storage requirements both on the client’s device and the backend.
Our protocol is resilient against most of the security and privacy attacks commonly launched against digital contact tracing apps. The proposed protocol incurs negligible overheads and supports low latency operations on the backend side, as demonstrated in our performance evaluations.
As our future work, we plan to deploy DIMY on our university campus, for which we have already acquired the ethical approval for the pilot deployment. We also aim to perform experiments to evaluate the performance of the frontend implementations for iOS and Android in terms of battery and storage consumption. Finally, we plan to conduct further scalability analysis of the blockchain-based backend operating on the AWS cloud for the real-world setting.
CRediT authorship contribution statement
Nadeem Ahmed: Conceptualisation, Methodology, Investigation, Writing – original draft. Regio A. Michelin: Methodology, Investigation, Writing – original draft. Wanli Xue: Methodology, Investigation, Writing – original draft. Guntur Dharma Putra: Software, Validation, Investigation. Sushmita Ruj: Writing – original draft, Writing – review & editing. Salil S. Kanhere: Writing – review & editing. Sanjay Jha: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work has been supported by the Cyber Security Cooperative Research Centre Limited (CSCRC), whose activities are partially funded by the Australian Government’s Cooperative Research Centres Programme . The authors would like to thank Wei Song for help in the implementation of the DIMY front-end mobile apps.
Biographies
Nadeem Ahmed received M.S. and Ph.D. degrees in computer science from the University of New South Wales, Sydney, Australia, in 2000 and 2007, respectively. He is currently working as a senior research fellow at the Cyber Security Cooperative Research Centre (CSCRC), Australia. Earlier, he worked as head of the Computing Department at the School of Electrical Engineering and Computer Science, NUST, Pakistan. His research interests include cyber security, the IoT, wireless sensor networks, and software-defined networking.
Regio A. Michelin received M.S. and Ph.D. degrees in computer science from the Pontifical Catholic University of Rio Grande do Sul, Brazil, in 2014 and 2019, respectively. He is currently working as research fellow at the Cyber Security Cooperative Research Centre (CSCRC), Australia. His research interests include blockchain, cybersecurity, and IoT.
Wanli Xue received Ph.D.degree from the School of Computer Science and Engineering, UNSW, Australia. He is currently a research fellow at the Cyber Security Cooperative Research Centre (CSCRC) and UNSW, Australia. His research interests include security and privacy issues in cyber physical systems and IoT, including highly efficient privacy-preserving techniques for IoT as well as IoT-related sensing systems and data analytic services.
Guntur Dharma Putra received his bachelor degree in Electrical Engineering from Universitas Gadjah Mada, Indonesia, in 2014. He received his master’s degree in Computing Science from the University of Groningen, the Netherlands, in 2017. He is currently a Ph.D. candidate at UNSW, Sydney, Australia. His research interests cover distributed systems and the IoT. He also looks into blockchain applications for securing IoT. Guntur is a student member of the IEEE.
Sushmita Ruj received her Masters and Ph.D. in Computer Science from the Indian Statistical Institute. She is currently a Senior Research Scientist at CSIRO Data61, Australia. She is also an Associate Professor at Indian Statistical Institute, Kolkata. Her research interests include blockchains, applied cryptography, and data privacy. She serves as a reviewer of Mathematical Reviews, and an Associate Editor of Elsevier Journal, Information Security and Applications. She is a recipient of the Samsung GRO award, NetApp Faculty Fellowship, Cisco Academic Grant and IBM OCSP grant. She is a Senior Member of the ACM and IEEE.
Salil S. Kanhere received his M.S. and Ph.D. degrees from Drexel University in Philadelphia. He is a Professor of Computer Science and Engineering at UNSW Sydney, Australia. His research interests include the IoT, blockchain, pervasive computing, cybersecurity and applied machine learning. He is a Senior Member of the IEEE and ACM, an Humboldt Research Fellow and an ACM Distinguished Speaker. He serves as the Editor in Chief of the Ad Hoc Networks journal and as Associate Editor of IEEE TNSM, COMCOM and PMC. He has served on the organising committee of several IEEE/ACM international conferences.
Sanjay K. Jha is a Full Professor at the School of Computer Science and Engineering, University of New South Wales (UNSW), Australia and leads UNSW in the Cybersecurity Cooperative Research Centre. He is also the Deputy Director of UNSW Institute for Cybersecurity (IFCYBER). His research interests include network security, wireless mesh and sensor networks and Internet of Things. His editorial affiliations include the IEEE Transactions on Mobile Computing (TMC) and the IEEE Transactions on Dependable and Secure Computing (TDSC). He is the Principal Author of the book Engineering Internet QoS and a Co-Editor of the book Wireless Sensor Networks: A Systems Perspective.
Footnotes
According to the Centres for Disease Control and Prevention (CDC, https://www.cdc.gov/coronavirus/2019-ncov/downloads/2019-ncov-factsheet.pdf), close contact with an infected person is defined as a contact within a range of 6 ft for approximately 15 min.
The infectious period for a COVID-19 positive case is considered as 2–3 weeks including the asymptomatic period.
There is a trade off in selecting the key size for DH exchange and minimising the number of BLE messages exchanged. In our scheme, the DH exchange is based on Elliptic curve group and used in conjunction with the Shamir secret sharing making it hard for an adversary to arrive at the contact information . The use of 16-Byte does not affect false negatives or the privacy of the users.
We take the 128-bit randomly generated , pass it through SHA-256 to get a 256-bit hash value, and then truncate it to retain the first 3 bytes.
The Epoch is loosely aligned with the randomised MAC address periods that happen at approximately half of the Epoch duration.
For scalability, the query is performed in a 24-h cycle from the time of app installation.
date can be a maximum of 21 days old, thus any CBF stored at the blockchain that is older than 21 days is not matched. This automatically takes care of CBFs pertaining to COVID-19 cases that are no longer infectious.
References
- Ahmed N., Michelin R.A., Xue W., Putra G.D., Song W., Ruj S., Kanhere S.S., Jha S. 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) 2021. Towards privacy-preserving digital contact tracing; pp. 1–3. [DOI] [Google Scholar]
- Ahmed N., Michelin R.A., Xue W., Ruj S., Malaney R., Kanhere S.S., Seneviratne A., Hu W., Janicke H., Jha S.K. A survey of COVID-19 contact tracing apps. IEEE Access. 2020;8:134577–134601. [Google Scholar]
- Androulaki, E., Barger, A., Bortnikov, V., Cachin, C., Christidis, K., De Caro, A., Enyeart, D., Ferris, C., Laventman, G., Manevich, Y., et al., 2018. Hyperledger fabric: a distributed operating system for permissioned blockchains. In: Proceedings of the Thirteenth EuroSys Conference. pp. 1–15.
- Apple . 2020. Privacy preserving contact tracing. https://www.apple.com/covid19/contacttracing. [Google Scholar]
- Avitabile G., Botta V., Iovino V., Visconti I. Workshop on Secure IT Technologies Aganist COVID-19 (Coronadef), NDSS. 2021. Towards defeating mass surveillance and SARS-CoV-2: The pronto-C2 fully decentralized automatic contact tracing system. [Google Scholar]
- Bay J., Kek J., Tan A., Hau C.S., Yongquan L., Tan J., Quy T.A. 2020. BlueTrace: A privacy-preserving protocol forcommunity-driven contact tracing across borders. https://bluetrace.io/static/bluetrace_whitepaper-938063656596c104632def383eb33b3c.pdf. [Google Scholar]
- Beskorovajnov, W., Dörre, F., Hartung, G., et al., 2020. ConTra Corona: Contact Tracing against the Coronavirus by Bridging the Centralized–Decentralized Divide for Stronger Privacy. Cryptology ePrint Archive, Report 2020/505.
- Bloom B.H. Space/time trade-offs in hash coding with allowable errors. Commun. ACM. 1970;13(7):422–426. [Google Scholar]
- Boneh D. In: Algorithmic Number Theory. Buhler J.P., editor. Springer Berlin Heidelberg; Berlin, Heidelberg: 1998. The decision Diffie-Hellman problem; pp. 48–63. [Google Scholar]
- Castelluccia, C., Bielova, N., Boutet, A., et al., 2020. DESIRE: A Third Way for a European Exposure Notification System Leveraging the Best of Centralized and Decentralized Systems. Technical Report hal-02570382.
- COVIDSafe, 2020. COVIDSafe. https://github.com/AU-COVIDSafe.
- Diffie W., Hellman M.E. New directions in cryptography. IEEE Trans. Inform. Theory. 1976;IT-22:644–654. [Google Scholar]
- Ethereum W. 2020. Ethereum.org. https://ethereum.org/ [Google Scholar]
- Google . 2020. Exposure notification API. https://www.google.com/covid19/exposurenotifications/ [Google Scholar]
- Gorenflo C., Lee S., Golab L., Keshav S. 2019. FastFabric: Scaling hyperledger fabric to 20, 000 transactions per second. CoRR arXiv:1901.00910. [Google Scholar]
- Hyperledger C. 2020. Hyperledger – open source blockchain technologies. https://www.hyperledger.org/ [Google Scholar]
- Lv W., Wu S., Jiang C., Cui Y., Qiu X., Zhang Y. 2020. Decentralized blockchain for privacy-preserving large-scale contact tracing. CoRR arXiv:2007.00894. URL https://arxiv.org/abs/2007.00894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitzenmacher M., Upfal E. Cambridge University Press; 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. [Google Scholar]
- Nakamoto, S., 2019. Bitcoin: A Peer-to-Peer Electronic Cash System. Tech. rep..
- O’Neill P.H., Ryan-Mosley T., Johnson B. 2020. A flood of coronavirus apps are tracking us. Now it’s time to keep track of them. https://www.technologyreview.com/2020/05/07/1000961/laun-ching-mittr-covid-tracing-tracker/ [Google Scholar]
- OpenTrace, 2020. OpenTrace. https://github.com/opentrace-community.
- Papapetrou O., Siberski W., Nejdl W. Cardinality estimation and dynamic length adaptation for bloom filters. Distrib. Parallel Databases. 2010;28:119–156. [Google Scholar]
- Rivest, R.L., Callas, J., Canetti, R., et al., 2020. The PACT Protocol Specifications. Technical Report 0.1, URL.
- 2020. Robert. https://github.com/ROBERT-proximity-tracing/document. [Google Scholar]
- Shamir A. How to share a secret. Commun. ACM. 1979;22(11):612–613. [Google Scholar]
- TCN Coalition, 2020. TCN Protocol for decentralized, privacy-preserving contact tracing. https://github.com/TCNCoalition/TCN.
- The DP3T Consortium, 2020. DESIRE: A Practical Assessment. https://github.com/DP-3T/documents/blob/master/Security%20analysis/DESIRE%20-%20A%20Practical%20Assessment.pdf.
- Tian Z., Li M., Qiu M., Sun Y., Su S. Block-DEF: A secure digital evidence framework using blockchain. Inform. Sci. 2019;491:151–165. [Google Scholar]
- Tian Z., Wang Y., Sun Y., Qiu J. Location privacy challenges in mobile edge computing: Classification and exploration. Network. 2020;34(2):52–56. [Google Scholar]
- Troncoso, C., et al., 2020. DP-3T. https://github.com/DP-3T.
- Vaudenay S. Analysis of DP3T: Between scylla and charybdis. IACR Cryptol. EPrint Arch. 2020;2020:399. [Google Scholar]
- Vaudenay S. Centralized or decentralized? The contact tracing dilemma. IACR Cryptol. EPrint Arch. 2020;2020:531. [Google Scholar]
- Xu H., Zhang L., Onireti O., Fang Y., Buchanan W.J., Imran M.A. BeepTrace: Blockchain-enabled privacy-preserving contact tracing for COVID-19 pandemic and beyond. IEEE Internet Things J. 2021;8(5):3915–3929. doi: 10.1109/JIOT.2020.3025953. [DOI] [PMC free article] [PubMed] [Google Scholar]