Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 Jun 15;12138:482–495. doi: 10.1007/978-3-030-50417-5_36

Microservice Disaster Crash Recovery: A Weak Global Referential Integrity Management

Maude Manouvrier 15,, Cesare Pautasso 16, Marta Rukoz 15,17
Editors: Valeria V Krzhizhanovskaya8, Gábor Závodszky9, Michael H Lees10, Jack J Dongarra11, Peter M A Sloot12, Sérgio Brissos13, João Teixeira14
PMCID: PMC7302836

Abstract

Microservices which use polyglot persistence (using multiple data storage techniques) cannot be recovered in a consistent state from backups taken independently. As a consequence, references across microservice boundaries may break after disaster recovery. In this paper, we give a weak global consistency definition for microservice architectures and present a recovery protocol which takes advantage of cached referenced data to reduce the amnesia interval for the recovered microservice, i.e., the time interval after the most recent backup, during which state changes may have been lost.

Keywords: Microservices, Referential integrity, Backup, Weak global consistency

Introduction

Microservices are small autonomous services, deployed independently, that implement a single, generally limited, business functionality [6, 14, 21, 23]. Microservices may need to store data. Different data storage pattern exist for microservices [21]. In the Database per Service pattern, defined in [19]: each microservice stores its persistent data in a private database. Each microservice has full control of a private database, persistent data being accessible to other services only via an API [24]. The invocation of a service API will result in transactions which only involve its database.

Relationships between related entities of an application based on a microservice architecture are represented by links: the state of a microservice can include links to other entities found on other microservice APIs [18]. Following the hypermedia design principle of the REST architectural style, these links can be expressed with Uniform Resource Identifiers (URIs) which globally address the referenced entities.

Since microservices are autonomous, not only do they use the most appropriate database technology for persistent storage of their state, but they also operate following an independent lifecycle, when their database is periodically backed up. For an entire microservice architecture, in practice, it is not very feasible to take an atomic snapshot of the state of all microservices. Thus, in case of one microservice crashes, which then needs to be recovered from its backup, the overall state of the microservice architecture may become inconsistent after recovery [18]. After recovery, such inconsistency may manifest itself as broken links between different microservices.

This paper presents a solution to ensure that the links between different entities managed by different microservices remain valid and intact even in the case of a database crash. The solution assumes that microservices referring to entities managed by other microservices will not only store the corresponding link, but also conserve a cached representation of the most recent known values. We present a recovery protocol when the crashed microservice can merge its own possibly stale backup with the possibly more recent cached representations obtained from other microservices. Thus, we revisit the definition of weak referential integrity across distributed microservice architectures.

Background and Related Work

Database Consistency, Durability, Backup and Disaster Crash Recovery

A database has a state, which is a value for each of its elements. The state of a database is consistent if it satisfies all the constraints [22]. Among constraints that ensure database consistency, referential integrity [8] is a database constraint that ensures that references between data are indeed valid and intact [4]. In a relational database, the referential integrity constraint states that a tuple/row in one relation referring, using a foreign key, to another relation, must refer to an existing tuple/row in that relation [11]. When a reference is defined, i.e. a value is assigned to a foreign key, the validity of the reference is checked, i.e. the referenced tuple should exist. In case of deletion, depending on the foreign key definition, the deletion of a tuple is forbidden if there are dependent foreign-key records, or the deletion of a record may cause the deletion of corresponding foreign-key records, or the corresponding foreign keys are set to null. Referential integrity is really broader and encompasses databases in general and not only relational ones [4].

Durability means that once a transaction, i.e. a set of update operations on the data, is committed, it cannot be abrogated. In the centralized databases systems, checkpoint and log are normally used to recover the state of the database in case of a system failure (e.g. the contents of main memory disappear due to a power loss and the content of a broken disk becoming illegible) [22]. Checkpoint is the point of synchronization between the database and transaction log file when all buffers are force-written to secondary storage [7]. For this kind of failure, the database can be reconstructed only if:

  • the log has been stored on another disk, separately from the failure one(s),

  • the log has been kept after a checkpoint, and

  • the log provides information to redo changes performed by transactions before the failure, and after the latest checkpoint.

To protect the database against media failures an up-to-date backup of the database, i.e. a copy of the database separate from the database itself, is used [22]. A backup of the database and its log can be periodically copied onto offline storage media [7]. In case of database corruption or device failure, the database can be restored from one of the backup copies, typically the most recent one [3]. In this case, the recovery is carried out using the backup and the log – see [15], for more details.

A database has a Disaster Crash when the main memory and the log, or a part of the log, are lost. Therefore, to recover the database, an old, maybe obsolete, backup of the database is used. Data which was not part of the backup will be lost. In case of a disaster crash, the system cannot guarantee the durability property. However, in a centralized database, recovery from a backup provides a database which has a consistent state.

Microservices as a Federated Multidatabase

Each database of a microservice can be seen as a centralized database. Seen across an entire microservice architecture, the microservice databases represent a distributed database system. A multidatabase is a distributed database system in which each site maintains complete autonomy. Federated multidatabase is a hybrid between distributed and centralized databases. It is a distributed system for global users and a centralized one for local users [7]. According to the definitions above, stateful microservice architectures can therefore be seen as a federated multidatabase.

A microservice database can store either a snapshot of the current state of the data, containing the most recent value of data, or an event log, i.e. the current state of the data can be rebuilt by replaying the log entries, which record the changes to the microservice state in the database transaction log. Let’s consider an example of a microservice managing orders. Using the snapshot architecture, the current state of an order can be stored in a row of a relational table Order. When using the event sourcing (log) [17], the application persists each order as a sequence of events e.g., listing the creation of the order, its update with customer details and the addition of each line item.

Each microservice ensures the durability and the consistency of its database, like in centralized databases. In the microservice context, each microservice manages its own database and stores independent backup of its own database, in order to permit disaster recovery from backup. However, while managing consistent backup is simple in a centralized database, maintaining consistent backups with distributed persistence in a federated multidatabase is challenging, as shown in the survey of [13]. So a model providing global consistent backup is necessary for microservices.

Microservice architecture deals with breaking foreign key relationships [16]. Each microservice can refer to other microservices data through loosely coupled references (i.e., URLs or links), which can be dereferenced using the API provided by the microservice managing the referenced data. Microservices are independent and the managing reference integrity between them is challenging. As for the World Wide Web [9, 12], there is no guarantee that a link retrieved from a microservice points to a valid URL [18]. In the following section, we propose a model providing global reference integrity for microservices.

Microservice Disaster Recovery

In [18], the authors have addressed the problem of backing up an entire microservice architecture and recovering it in case of a disaster crash affecting one microservice. They defined the BAC theorem, inspired from the CAP Theorem [5], which states that when backing up an entire microservice architecture, it is not possible to have both availability and consistency.

Let us consider the microservice architecture defined in [18], where each microservice manages its own database and can refer to other microservices data through loosely coupled references. Each microservice does independent backup of its own database for the purpose of allowing disaster recovery from backup.

Figure 1 presents an example of two microservices with their independent backup, data of the microservice Order referring data of microservice Customer. Database of each microservice is represented in gray and data in black. Each database contains three entities. Entities C/i (Inline graphic) correspond to customers, described by a name, and are managed by microservice Customer. Entities O/i (Inline graphic) correspond to orders and are managed by microservice Order. Each order O/i refers to a customer C/i. Backups of the database are represented in blue. The backup of microservice Customer only contains a copy of customers C/1 and C/2. The backup and the database of microservice Order are, on the contrary, synchronized.

Fig. 1.

Fig. 1.

An example of microservice architecture with independent backup (Color figure online)

As explained in [18], in case of disaster crash, independent backup may lead to broken link (see Fig. 2): no more customer C/3 exists after Customer recovery, then O/3 has a broken link.

Fig. 2.

Fig. 2.

The link from the Order microservice to entity C/3 is broken after the recovery of Customer microservice from an old backup

A solution to avoid broken link is to synchronize the backup of all microservices, leading to limited autonomy of microservices and loss of data. In Fig. 3, both order and customer C/3 and O/3 are lost after the recovery.

Fig. 3.

Fig. 3.

Synchronized backup of an entire microservice architecture

Please note that broken link can also appear when a referenced data is deleted, e.g. when a customer is deleted in the local database of microservice Customer. In this case, the referential integrity is not respected.

As aforementioned, several approaches indicate that microservice architecture implies some challenging problems of data integrity and consistency management [2, 18], as well as the difficulty of managing consistent backups due to distributed persistence [13]. However, as far as we know, no approach proposes a solution to such problems. In the following, we present a solution that can bypass such referential integrity violation and broken links.

Our Solution: A Weak Referential Integrity Management

In this work, we focus on referential integrity. We present a solution to help the user in the recovery of the system referential integrity in case of a disaster crash. We define the global consistency as a time-dependent property. We propose a new global consistency definition, called the weak global referential consistency. Our solution provides information about the global state in case of a disaster crash that the users can pinpoint exactly the location, and time interval, of missing data which needs to be manually repaired.

In the following, we first present the context and assumptions (Subsect. 3.1), without taking disaster crash into consideration. Then, we introduce our definition of global consistency (Subsect. 3.2). Based on this definition, we show the method of recovery from a disaster crash affecting one microservice. All symbols used in this article can be found in Table 1 above.

Table 1.

Table of symbols

Symbol Description
Inline graphic Microservice
D Database of microservice Inline graphic
e Entity of a database D
Inline graphic Uniform Resource Identifier of an entity e
Inline graphic Date of the last update of an entity e in D
Inline graphic Dependency counter associated with entity e
Inline graphic Epoch identity, k being a timestamp
Inline graphic ith timestamp related to epoch Inline graphic
B Backup of the local database of microservice Inline graphic
Inline graphic Amnesia interval of an entity e

Context and Assumptions

In this article, microservice follows the pattern called Database per Service (defined in [19]), where each microservice has full control of a private database, persistent data being accessible to other services only via an API. Each microservice also use an event-driven architecture, such as the one defined in [20], consuming and publishing a sequence of state-changing events.

The following are our assumptions:

  • Microservices are part of the same application.

  • All microservices of an application trust each other.

  • Each microservice Inline graphic has a database D storing a set of entities.

  • Each entity Inline graphic can be either a RESTful API resource, a relational tuple, a key-value record, a document or graph database item.

  • Each entity e has a Uniform Resource Identifier, Inline graphic, that identifies the entity.

  • The state of each entity e is read, updated and deleted using standard HTTP protocol primitives (GET, PUT and DELETE). In addition, we introduce two additional operations: getReference, deleteReference.

  • Each microservice Inline graphic ensures the consistency and the durability of its own database D.

Taking into consideration the following ; handling the references between the different microservices and ensuring that the system is reliable when no failure occurs:

  • Each microservice has its own clock. The clock of different microservices are not necessarily synchronized.

  • An entity Inline graphic, managed by a microservice Inline graphic, can refer to another entity e managed by a microservice Inline graphic.

  • The reference from microservice Inline graphic to an entity e, managed by a microservice Inline graphic, is the couple Inline graphic with the timestamp Inline graphic marking the date of the last update of entity e in D as it is known by the microservice Inline graphic, i.e. exactly when Inline graphic queries the microservice Inline graphic, using the clock of microservice Inline graphic.

There are 2 cases as far as reference storage is concerned:

  1. the minimalist case consists in just storing the reference and the most recent modified timestamp, i.e. couple Inline graphic;

  2. the eager/self-contained backup case consists in storing a copy of the referenced entity state, that can be cached by Inline graphic. When microservice Inline graphic stores a copy of the referenced entity in its cache, this former copy is considered as detached, identical to detached entity in object-relational mapping using JPA specification [10]. Detached means that the copy is not managed by Inline graphic, microservice Inline graphic being responsible for keeping its cache up-to-date. Cached representation is only a representation of the original entity state, thus it may only contain a projection. For our solution, we assume that it is possible to reconstruct the original entity state from its cached representation.

Global Consistency

In case of no disaster crash, the global consistency can be defined as follows:

Definition 1

The global consistency

A global state is consistent if:

  • (local database consistency) each local database is locally consistent in the traditional sense of a database, i.e. all its integrity constraints are satisfied.

and

  • (global referential integrity) the timestamp value associated with each reference is less than or equal to the timestamp value of the corresponding referenced entity.

    Formally: for each couple Inline graphic associated with an entity Inline graphic referencing another entity e of Inline graphic, Inline graphic, with Inline graphic the most recent update timestamp of e in Inline graphic.

Case of Snapshot Data Storage Pattern. When using the snapshot data storage pattern, each local database contains the current state of its microservice. In order to guarantee the referential integrity in the microservice architecture, a microservice Inline graphic, cannot delete an entity e, if there is an entity Inline graphic managed by another microservice Inline graphic that refers to the entity e. We suggest a referential integrity mechanism based on dependency counters, as follows:

  • Each entity e managed by a microservice Inline graphic is associated with a dependency counter Inline graphic. This counter indicates how many other entities managed by other microservices refer to entity e. It is initially set at 0.

  • When a microservice Inline graphic wants to create the entity Inline graphic that refers an entity e managed by Inline graphic, it sends a getReference message to microservice Inline graphic. The corresponding dependency counter Inline graphic is incremented. Then, Inline graphic sends the couple Inline graphic back to Inline graphic, with Inline graphic, the date of the most recent update of entity e.

  • When microservice Inline graphic receives the information about e, it creates the entity Inline graphic.

  • When microservice Inline graphic deletes an entity Inline graphic that refers e, it sends a message deleteReference to microservice Inline graphic, indicating that the reference to e does not exist any more. Inline graphic is therefore decremented.

  • Microservice Inline graphic cannot delete an entity e if its dependency counter is Inline graphic. It retains the most recent value of entity e with its most recent update time, Inline graphic, and flags the entity by Inline graphic indicating that e must be deleted when its dependency counter reaches the value of 0.

According to Definition 1, a reliable microservice system using the referential integrity mechanism based on dependency counters, will always be globally consistent.

Case of Event Sourcing Data Storage Pattern. When choosing event sourcing as data storage pattern [20], each local database contains an event log, which records all changes of the microservice state. Thus, it is possible to rebuild the current state of the data by replaying the event log. In this case, we propose the following referential integrity mechanism:

  • When a microservice Inline graphic wants to create the entity Inline graphic that refers an entity e managed by Inline graphic, it sends a getReference message to microservice Inline graphic. When Inline graphic receives the information about e, it creates the entity Inline graphic and a creation event, associated with the corresponding reference Inline graphic, is stored in the log of Inline graphic.

  • When an entity e of microservice Inline graphic must be deleted, instead of deleting it, the microservice Inline graphic flags it by Inline graphic, and a deletion event, associated with the related timestamp, is stored in its log, representing the most recent valid value of entity e.

Thus, it is easy to prove that global consistency state can be obtained from the logs. For each couple Inline graphic of a referenced entity e, timestamp Inline graphic must appear in the event log of microservice Inline graphic. Moreover the most recent record associated with entity e, corresponding to an update or deletion of e, in the event log of Inline graphic, has a timestamp Inline graphic, with Inline graphic, any timestamp appearing in any reference couple Inline graphic stored in the event log of any other microservice referencing e.

Fault Tolerant Management of Microservice Referential Integrity

As explained in Sect. 2.3, disaster crash can occur in microservice architectures. In the following, we consider disaster crash affecting only one microservice.

To protect the local database from media-failure, each microservice stores an up-to-date backup of its database, i.e. a copy of the database separate from the database itself. Each microservice individually manages the backup of its database. The way in which microservices independently manage their backup is out of the scope of this paper.

A disaster crash of a microservice Inline graphic means that its local database and its log are lost and we have to recover the database from a past backup. The backup provides a consistent state of the local database. However, as the database has been recovered from a past backup, data could have been lost. In order to provide a state of the local database as close as possible to the one of the database before the failure, data cached by other microservices can be used. When a microservice Inline graphic refers to an entity managed by another microservice Inline graphic, it can store a detached copy of the referenced entity. Therefore, these detached copies can be used to update the state of the database obtained after recovery from the backup.

In the following, we present the concepts used to manage disaster crash, our recovery protocol, how to optimize it and we define the Weak Global Referential Integrity.

Backup and Recovery, Amnesia Interval and New Epoch. To manage disaster crash, our assumptions are:

  • Each entity of the local database of microservice Inline graphic is associated with an epoch identity Inline graphic. An epoch is a new period after a disaster crash recovery. A new epoch Inline graphic begins at the first access of an entity after recovery. Therefore, a timestamp Inline graphic associated with an entity e represents the ith timestamp related to epoch Inline graphic. Inline graphic when no crash has occurred, Inline graphic otherwise. The value of k always increases, being associated with time.

  • When a backup B of the local database of microservice Inline graphic is done, operation BCK, the backup is associated with clock epoch identity Inline graphic, and with a creation timestamp Inline graphic, such that: all entities e, stored in backup B, have an updated timestamp Inline graphic. Epoch Inline graphic associated with backup and epoch Inline graphic associated with the local database are such that: Inline graphic.

  • As long as there is no disaster crash, the local database and the backup are associated with the same epoch identity.

  • In case of disaster crash of microservice Inline graphic, when the local database is locally recovered from an past obsolete backup created at time Inline graphic, it is known that local database has an amnesia interval starting from Inline graphic. This amnesia interval is associated with all entities saved in the backup and lasts until such entities are accessed again (see Definition 2).

  • Each entity of the recovered database is associated with a timestamp related to epoch Inline graphic of the backup. This timestamp remains as long as no updates have been carried out. A new epoch Inline graphic begins at the first reading or written access of an entity, k containing the current date. A written operation, PUT, overwrites whatever value was recovered. However, epoch should also be updated after a reading operation, GET. Any other microservice reading from the state of the recovered entity will establish a causal dependency, which would be in conflict with further more recent recovered values from the previous epoch (see [1] for more details).

Definition 2

Amnesia Interval

An amnesia interval of microservice Inline graphic is a time interval indicating that a disaster crash has occurred for the local database of Inline graphic. This interval is associated with each entity managed by Inline graphic. An amnesia interval Inline graphic of an entity e means that:

  • Epoch Inline graphic is the epoch associated with the backup used for the database recovery.

  • Timestamp Inline graphic corresponds to the time of most recent known update of e. It is either the timestamp associated with the backup used for recovery, or the timestamp of a cached copy of e stored in a microservice referring entity e.

  • Timestamp Inline graphic corresponds to the first reading or written operation on e from another microservice Inline graphic, after Inline graphic (Inline graphic).

Weak Global Referential Integrity. After a crash recovery, data can be lost, so we define a weak global referential integrity of the microservice architecture. Weak means that either the global referential integrity has been checked, verifying Definition 1, or an amnesia is discovered; data has been lost as well as the interval of time when the data was lost. This makes it possible to focus on the manual data recovery and reconstruction effort within the amnesia interval.

Definition 3

Weak global consistency

After a disaster crash recovery of a microservice Inline graphic, the system checks a weak global consistency iff:

  • (local database consistency) each local database is locally consistent in the traditional sense of a database, i.e. all its integrity constraints are satisfied.

and

  • (weak global referential integrity) the timestamp value associated with each reference is either less than or equal to the timestamp value of the corresponding referenced entity or included in an amnesia interval. Formally: for each couple Inline graphic associated with an entity of Inline graphic referencing another entity e of Inline graphic:
    • either Inline graphic, with Inline graphic the most recent update timestamp of e in Inline graphic, epochs Inline graphic and Inline graphic being comparable (Inline graphic);
    • or Inline graphic, with Inline graphic the amnesia interval associated with the referenced entity e, after a disaster crash of Inline graphic that manages e.

Consider a scenario of two microservices Inline graphic and Inline graphic, Inline graphic referencing an entity managed by Inline graphic (see Fig. 4), but without storing any cache of the referenced entity. In figures, only timestamps of one entity of microservice Inline graphic are considered, timestamps and epoch identities being only represented by numbers. In Fig. 4, at time Inline graphic an entity is created by microservice Inline graphic ; operation PUT. A backup B is made, storing entities of microservice Inline graphic created before Inline graphic ; operation BCK. Microservice Inline graphic refers the entity of Inline graphic created at time Inline graphic ; operation GET. An update of the entity is carried out by the microservice Inline graphic at time Inline graphic ; operation PUT. When disaster crash appears to Inline graphic (see red flash), Inline graphic must recover using the backup, update of time Inline graphic is lost, therefore it has amnesia that begins from Inline graphic. An update is done to the entity, then a new epoch 2 begins and timestamp Inline graphic is associated to the updated value. The amnesia interval is then updated to Inline graphic. If Inline graphic does another GET to refresh the referenced value, the up-to-date timestamp Inline graphic is sent by Inline graphic.

Fig. 4.

Fig. 4.

Example of a scenario with 2 microservices, without cached data. PUT represents a state change of the referenced entity. BCK indicates when a backup snapshot is taken. LR shows when the microservice is locally recovered from the backup. (Color figure online)

Recovery Protocol. When a disaster crash occurs to a microservice Inline graphic, Inline graphic informs all other services of its recovery. Moreover, when microservices stored copies of the entities they refer to, in their cache, the amnesia interval associated with each recovered entity of Inline graphic can be reduced using cached replicas. In order to do so, the steps are:

  • After the recovery of Inline graphic, an event indicating that there is amnesia is sent, or broadcast, to other microservices.

  • When a microservice Inline graphic receives an amnesia event from microservice Inline graphic, managing an entity e it refers to; if it has stored a replica of e in its cache, then Inline graphic sends the replica of the entity it refers to, to Inline graphic.

  • When microservice Inline graphic receives replies carrying information from Inline graphic about its entity e, it compares the value of e, associated with timestamp Inline graphic, with the value, associated with timestamp Inline graphic, it stored, if epochs k and Inline graphic are comparable. Then, it retains the more up-to-date value and shrinks the amnesia interval associated with e if necessary.

  • Once a read operation or an update operation is done on e, a new epoch begins, and the first timestamp associated with this new epoch represents the end of the amnesia interval.

  • The beginning of the amnesia interval can still be shifted if more up-to-date values are received from belated replies from other cached replicas.

In Fig. 5, Inline graphic stores an up-to-date value of the referenced entity, after the backup of Inline graphic ; operation GET. After the recovery from the past backup of time Inline graphic, Inline graphic sends an event about its amnesia, associated with interval Inline graphic. After receiving this amnesia event, Inline graphic sends its up-to-date value, associated with Inline graphic, to Inline graphic ; event reply. Inline graphic stores this up-to-date value, associated with timestamp Inline graphic, and updates the amnesia interval to Inline graphic. After a update is done to the entity, a new epoch 2 begins and timestamp Inline graphic is associated to the updated value ; operation PUT. The amnesia interval is then updated to Inline graphic.

Fig. 5.

Fig. 5.

Recovery scenario using cached data more recent than the backup.

Availability vs Consistency. After a disaster crash of Inline graphic: either Inline graphic is immediately available after its local recovery, or it expects information sent by other microservices that refer its entities before the disaster crash, to provide a more recent database snapshot than the past used backup, updating the value stored in the backup with the copy stored in the cache of the other microservices.

If we are uncertain that all microservices will answer the amnesia event or if Inline graphic ignores or partially knows which microservices refer to (case 1), Inline graphic can wait for a defined timeout.

If we assume that all microservices are available and will answer to the amnesia event (case 2), Inline graphic waits until all microservices have sent their reply to the amnesia event.

After the timeout (case 1) or the reception of all responses (case 2), the recovery is ended and Inline graphic is available.

When dependency counters are used (see Sect. 3.2) and if we are sure that the identity of all microservices that refer to Inline graphic is known after the disaster crash: an optimization of the recovery process can be used. In this case, an amnesia event is sent only to all microservices referring to Inline graphic, instead of broadcast, and Inline graphic waits until all the aforementioned microservices reply. To do so, the address of each microservice referring to Inline graphic should be stored by Inline graphic, when the dependency counter is updated. Each referencing microservice can either send the value it stores in its cache, or a message indicating that it is no longer concerned by the amnesia, because it currently does not refer to any entity of Inline graphic.

The choice between the aforementioned steps depends on the focus on availability (Inline graphic is available as soon as possible after its disaster crash, with a large amnesia interval) or on consistency (we prefer to wait in order to provide a more recent snapshot than the one used for the recovery before making Inline graphic available).

Conclusions and Future Work

In this paper, we have focused on preserving referential integrity within microservice architecture during disaster recovery. We have introduced a definition of weak global referential consistency and a recovery protocol taking advantage of replicas found in microservice caches. These are merged with local backup to reduce the amnesia interval of the recovered microservice. The approach has been validated under several assumptions: direct references to simple entities, single crashes and no concurrent recovery of more than one failed microservice.

In this paper we focused on reliability aspects, whereas as part of future work we plan to assess the performance implications of our approach in depth. We will also address more complex relationships between microservices, e.g., transitive or circular dependencies, which may span across multiple microservices. While microservice architecture is known for its ability to isolate failures, which should not cascade across multiple microservices, it remains an open question how to apply our approach to perform the concurrent recovery of multiple microservices which may have failed independently over an overlapping period of time.

Acknowledgements

The authors would like to thank Guy Pardon, Eirlys Da Costa Seixas and the referees of the article for their insightful feedback.

Contributor Information

Valeria V. Krzhizhanovskaya, Email: V.Krzhizhanovskaya@uva.nl

Gábor Závodszky, Email: G.Zavodszky@uva.nl.

Michael H. Lees, Email: m.h.lees@uva.nl

Jack J. Dongarra, Email: dongarra@icl.utk.edu

Peter M. A. Sloot, Email: p.m.a.sloot@uva.nl

Sérgio Brissos, Email: sergio.brissos@intellegibilis.com.

João Teixeira, Email: joao.teixeira@intellegibilis.com.

Maude Manouvrier, Email: maude.manouvrier@dauphine.fr.

References

  • 1.Ahamad M, Neiger G, Burns JE, Kohli P, Hutto PW. Causal memory: definitions, implementation, and programming. Distrib. Comput. 1995;9(1):37–49. doi: 10.1007/BF01784241. [DOI] [Google Scholar]
  • 2.Baresi L, Garriga M. Microservices: the evolution and extinction of web services?; Microservices; Cham: Springer; 2020. pp. 3–28. [Google Scholar]
  • 3.Bhattacharya, S., Mohan, C., Brannon, K.W., Narang, I., Hsiao, H.I., Subramanian, M.: Coordinating backup/recovery and data consistency between database and file systems. In: ACM SIGMOD International Conference on Management of data, pp. 500–511. ACM (2002)
  • 4.Blaha, M.: Referential integrity is important for databases. Modelsoft Consulting Corp. (2005)
  • 5.Brewer E. CAP twelve years later: how the “rules” have changed. Computer. 2012;45(2):23–29. doi: 10.1109/MC.2012.37. [DOI] [Google Scholar]
  • 6.Bucchiarone A, Dragoni N, Dustdar S, Larsen ST, Mazzara M. From monolithic to microservices: an experience report from the banking domain. IEEE Softw. 2018;35(3):50–55. doi: 10.1109/MS.2018.2141026. [DOI] [Google Scholar]
  • 7.Connoly T, Begg C. Database Systems. ke-3. England: Addison-Wesley; 1998. [Google Scholar]
  • 8.Date, C.J.: Referential integrity. In: 7th International Conference on Very Large Data Bases (VLDB), pp. 2–12 (1981)
  • 9.Davis, H.C.: Referential integrity of links in open hypermedia systems. In: 9th ACM Conference on Hypertext and Hypermedia: Links, Objects, Time and Space, pp. 207–216 (1998)
  • 10.DeMichiel, L., Keith, M.: Java persistence API. JSR 220 (2006)
  • 11.Elmasri R, Navathe S. Fundamentals of Database Systems. Boston: Addison-Wesley; 2010. [Google Scholar]
  • 12.Ingham D, Caughey S, Little M. Fixing the “broken-link" problem: the W3objects approach. Comput. Netw. ISDN Syst. 1996;28(7–11):1255–1268. doi: 10.1016/0169-7552(96)00069-4. [DOI] [Google Scholar]
  • 13.Knoche H, Hasselbring W. Drivers and barriers for microservice adoption-a survey among professionals in Germany. Enterp. Model. Inf. Syst. Architect. (EMISAJ) 2019;14:1–1. [Google Scholar]
  • 14.Lewis, J., Fowler, M.: Microservices a definition of this new architectural term (2014). http://martinfowler.com/articles/microservices.html
  • 15.Mohan C, Haderle D, Lindsay B, Pirahesh H, Schwarz P. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. (TODS) 1992;17(1):94–162. doi: 10.1145/128765.128770. [DOI] [Google Scholar]
  • 16.Newman S. Building Microservices: Designing Fine-Grained Systems. Newton: O’Reilly; 2015. [Google Scholar]
  • 17.Overeem, M., Spoor, M., Jansen, S.: The dark side of event sourcing: managing data conversion. In: 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 193–204. IEEE (2017)
  • 18.Pardon G, Pautasso C, Zimmermann O. Consistent disaster recovery for microservices: the BAC theorem. IEEE Cloud Comput. 2018;5(1):49–59. doi: 10.1109/MCC.2018.011791714. [DOI] [Google Scholar]
  • 19.Richardson, C.: Pattern: database per service (2018). https://microservices.io/patterns/data/database-per-service.html. Accessed 02 Apr 2020
  • 20.Richardson, C.: Pattern: event sourcing (2018). https://microservices.io/patterns/data/event-sourcing.html. Accessed 01 Apr 2019
  • 21.Taibi, D., Lenarduzzi, V., Pahl, C.: Architectural patterns for microservices: a systematic mapping study. In: CLOSER, pp. 221–232 (2018)
  • 22.Ullman JD, Garcia-Molina H, Widom J. Database Systems: The Complete Book. 1. Upper Saddle River: Prentice Hall; 2001. [Google Scholar]
  • 23.Zimmermann O. Microservices tenets. Comput. Sci. Res. Dev. 2016;32:301–310. doi: 10.1007/s00450-016-0337-0. [DOI] [Google Scholar]
  • 24.Zimmermann, O., Stocker, M., Lübke, D., Pautasso, C., Zdun, U.: Introduction to microservice API patterns (MAP). In: Joint Post-Proceedings of the First and Second International Conference on Microservices (Microservices 2017/2019). OpenAccess Series in Informatics (OASIcs), vol. 78, pp. 4:1–4:17 (2020)

Articles from Computational Science – ICCS 2020 are provided here courtesy of Nature Publishing Group

RESOURCES