From non-preemptive to preemptive scheduling using synchronization synthesis

Pavol Černý; Edmund M Clarke; Thomas A Henzinger; Arjun Radhakrishna; Leonid Ryzhyk; Roopsha Samanta; Thorsten Tarrach

doi:10.1007/s10703-016-0256-5

. 2016 Sep 27;50(2):97–139. doi: 10.1007/s10703-016-0256-5

From non-preemptive to preemptive scheduling using synchronization synthesis

Pavol Černý ¹, Edmund M Clarke ², Thomas A Henzinger ³, Arjun Radhakrishna ⁴, Leonid Ryzhyk ⁵, Roopsha Samanta ⁶, Thorsten Tarrach ^3,^✉

PMCID: PMC5399096 PMID: 28490835

Abstract

We present a computer-aided programming approach to concurrency. The approach allows programmers to program assuming a friendly, non-preemptive scheduler, and our synthesis procedure inserts synchronization to ensure that the final program works even with a preemptive scheduler. The correctness specification is implicit, inferred from the non-preemptive behavior. Let us consider sequences of calls that the program makes to an external interface. The specification requires that any such sequence produced under a preemptive scheduler should be included in the set of sequences produced under a non-preemptive scheduler. We guarantee that our synthesis does not introduce deadlocks and that the synchronization inserted is optimal w.r.t. a given objective function. The solution is based on a finitary abstraction, an algorithm for bounded language inclusion modulo an independence relation, and generation of a set of global constraints over synchronization placements. Each model of the global constraints set corresponds to a correctness-ensuring synchronization placement. The placement that is optimal w.r.t. the given objective function is chosen as the synchronization solution. We apply the approach to device-driver programming, where the driver threads call the software interface of the device and the API provided by the operating system. Our experiments demonstrate that our synthesis method is precise and efficient. The implicit specification helped us find one concurrency bug previously missed when model-checking using an explicit, user-provided specification. We implemented objective functions for coarse-grained and fine-grained locking and observed that different synchronization placements are produced for our experiments, favoring a minimal number of synchronization operations or maximum concurrency, respectively.

Keywords: Synthesis, Concurrency, NFA language inclusion, MaxSAT

Introduction

Programming for a concurrent shared-memory system, such as most common computing devices today, is notoriously difficult and error-prone. Program synthesis for concurrency aims to mitigate this complexity by synthesizing synchronization code automatically [5, 6, 9, 15]. However, specifying the programmer’s intent may be a challenge in itself. Declarative mechanisms, such as assertions, suffer from the drawback that it is difficult to ensure that the specification is complete and fully captures the programmer’s intent.

We propose a solution where the specification is implicit. We observe that a core difficulty in concurrent programming originates from the fact that the scheduler can preempt the execution of a thread at any time. We therefore give the developer the option to program assuming a friendly, non-preemptive, scheduler. Our tool automatically synthesizes synchronization code to ensure that every behavior of the program under preemptive scheduling is included in the set of behaviors produced under non-preemptive scheduling. Thus, we use the non-preemptive semantics as an implicit correctness specification.

The non-preemptive scheduling model (also known as cooperative scheduling [26]) can simplify the development of concurrent software, including operating system (OS) kernels, network servers, database systems, etc. [21, 22]. In the non-preemptive model, a thread can only be descheduled by voluntarily yielding control, e.g., by invoking a blocking operation. Synchronization primitives may be used for communication between threads, e.g., a producer thread may use a semaphore to notify the consumer about availability of data. However, one does not need to worry about protecting accesses to shared state: a series of memory accesses executes atomically as long as the scheduled thread does not yield.

A user evaluation by Sadowski and Yi [22] demonstrated that this model makes it easier for programmers to reason about and identify defects in concurrent code. There exist alternative implicit correctness specifications for concurrent programs. For example, for functional programs one can specify the final output of the sequential execution as the correct output. The synthesizer must then generate a concurrent program that is guaranteed to produce the same output as the sequential version [3]. This approach does not allow any form of thread coordination, e.g., threads cannot be arranged in a producer–consumer fashion. In addition, it is not applicable to reactive systems, such as device drivers, where threads are not required to terminate.

Another implicit specification technique is based on placing atomic sections in the source code of the program [14]. In the synthesized program the computation performed by an atomic section must appear atomic with respect to the rest of the program. Specifications based on atomic sections and specifications based on the non-preemptive scheduling model, used by our tool, can be easily expressed in terms of each other. For example, one can simulate atomic sections by placing $yield$ statements before and after each atomic section, as well as around every instruction that does not belong to any atomic section.

We believe that, at least for systems code, specifications based on the non-preemptive scheduling model are easier to write and are less error-prone than atomic sections. Atomic sections are subject to syntactic constraints. Each section is marked by a pair of matching opening and closing statements, which in practice means that the section must start and end within the same program block. In contrast, a $yield$ can be placed anywhere in the program.

Moreover, atomic sections restrict the use of thread synchronization primitives such as semaphores. An atomic section either executes in its entirety or not at all. In the former case, all wait conditions along the execution path through the atomic section must be simultaneously satisfied before the atomic section starts executing. In practice, to avoid deadlocks, one can only place a blocking instruction at the start of an atomic section. Combined with syntactic constraints discussed above, this restricts the use of thread coordination with atomic sections—a severe limitation for systems code where thread coordination is common. In contrast, synchronization primitives can be used freely under non-preemptive scheduling. Internally, they are modeled using $yield$ s: for instance, a semaphore acquisition instruction is modeled by a $yield$ followed by an $assume$ statement that proceeds when the semaphore becomes available.

Lastly, our specification defaults to the safe choice of assuming everything needs to be atomic unless a $yield$ statement is placed by the programmer. In contrast, code that uses atomic sections can be preempted at any point unless protected by an explicit atomic section.

In defining behavioral equivalence between preemptive and non-preemptive executions, we focus on externally observable program behaviors: two program executions are observationally equivalent if they generate the same sequences of calls to interfaces of interest. This approach facilitates modular synthesis where a module’s behavior is characterized in terms of its interaction with other modules. Given a multi-threaded program $C$ and a synthesized program $C^{'}$ obtained by adding synchronization to $C, C^{'}$ is preemption-safe w.r.t. $C$ if for each execution of $C^{'}$ under a preemptive scheduler, there is an observationally equivalent non-preemptive execution of $C$ . Our synthesis goal is to automatically generate a preemption-safe version of the input program.

We rely on abstraction to achieve efficient synthesis of multi-threaded programs. We propose a simple, data-oblivious abstraction inspired by an analysis of synchronization patterns in OS code, which tend to be independent of data values. The abstraction tracks types of accesses (read or write) to each memory location while ignoring their values. In addition, the abstraction tracks branching choices. Calls to an external interface are modeled as writes to a special memory location, with independent interfaces modeled as separate locations. To the best of our knowledge, our proposed abstraction is yet to be explored in the verification and synthesis literature. The abstract program is denoted as $C_{abs}$ .

Two abstract program executions are observationally equivalent if they are equal modulo the classical independence relation I on memory accesses. This means that every sequence $ω$ of observable actions is equivalent to a set of sequences of observable actions that are derived from $ω$ by repeatedly commuting independent actions. Independent actions are accesses to different locations, and accesses to the same location iff they are both read accesses. Using this notion of equivalence, the notion of preemption-safety is extended to abstract programs.

Under abstraction, we model each thread as a nondeterministic finite automaton (NFA) over a finite alphabet, with each symbol corresponding to a read or a write to a particular variable. This enables us to construct NFAs ${NP}_{abs}$ , representing the abstraction of the original program $C$ under non-preemptive scheduling, and $P_{a b s}$ , representing the abstraction of the synthesized program $C^{'}$ under preemptive scheduling. We show that preemption-safety of $C^{'}$ w.r.t. $C$ is implied by preemption-safety of the abstract synthesized program $C_{abs}^{'}$ w.r.t. the abstract original program $C_{abs}$ , which, in turn, is implied by language inclusion modulo I of NFAs $P_{a b s}$ and ${NP}_{abs}$ . While the problem of language inclusion modulo an independence relation is undecidable [2], we show that the antichain-based algorithm for standard language inclusion [11] can be adapted to decide a bounded version of language inclusion modulo an independence relation.

Our synthesis works in a counterexample-guided inductive synthesis (CEGIS) loop that accumulates a set of global constraints. The loop starts with a counterexample obtained from the language inclusion check. A counterexample is a sequence of locations in $C_{abs}$ such that their execution produce an observation sequence that is valid under the preemptive semantics, but not under the non-preemptive semantics. From the counterexample we infer mutual exclusion (mutex) constraints, which when enforced in the language inclusion check avoid returning the same counterexample again. We accumulate the mutex constraints from all counterexamples iteratively generated by the language inclusion check. Once the language inclusion check succeeds, we construct a set of global constraints using the accumulated mutex constraints and constraints for enforcing deadlock-freedom. This approach is the key difference to our previous work [4], where a greedy approach is employed that immediately places a lock to eliminate a bug. The greedy approach may result in a suboptimal lock placement with unnecessarily overlapping or nested locks.

The global approach allows us to use an objective function f to find an optimal lock placement w.r.t. f once all mutex constraints have been identified. Examples of objective functions include minimizing the number of $lock$ statements (leading to coarse-grained locking) and maximizing concurrency (leading to fine-grained locking). We encode such an objective function, together with the global constraints, into a weighted maximum satisfiability (MaxSAT) problem, which is then solved using an off-the-shelf solver.

Since the synthesized lock placement is guaranteed not to introduce deadlocks our solution follows good programming practices with respect to locks: no double locking, no double unlocking and no locks locked at the end of the execution.

We implemented our synthesis procedure in a new prototype tool called Liss (Language Inclusion-based Synchronization Synthesis) and evaluated it on a series of device driver benchmarks, including an Ethernet driver for Linux and the synchronization skeleton of a USB-to-serial controller driver, as well as an in-memory key-value store server. First, Liss was able to detect and eliminate all but two known concurrency bugs in our examples; these included one bug that we previously missed when synthesizing from explicit specifications [6], due to a missing assertion. Second, our abstraction proved highly efficient: Liss runs an order of magnitude faster on the more complicated examples than our previous synthesis tool based on the CBMC model checker. Third, our coarse abstraction proved surprisingly precise for systems code: across all our benchmarks, we only encountered three program locations where manual abstraction refinement was needed to avoid the generation of unnecessary synchronization. Fourth, our tool finds a deadlock-free lock placement for both a fine-grained and a coarse-grained objective function. Overall, our evaluation strongly supports the use of the implicit specification approach based on non-preemptive scheduling semantics as well as the use of the data-oblivious abstraction to achieve practical synthesis for real-world systems code. With the two objective functions we implemented, Liss produces an optimal lock placements w.r.t. the objective.

Contributions First, we propose a new specification-free approach to synchronization synthesis. Given a program written assuming a friendly, non-preemptive scheduler, we automatically generate a preemption-safe version of the program without introducing deadlocks. Second, we introduce a novel abstraction scheme and use it to reduce preemption-safety to language inclusion modulo an independence relation. Third, we present the first language inclusion-based synchronization synthesis procedure and tool for concurrent programs. Our synthesis procedure includes a new algorithm for a bounded version of our inherently undecidable language inclusion problem. Fourth, we synthesize an optimal lock placement w.r.t. an objective function. Finally, we evaluate our synthesis procedure on several examples. To the best of our knowledge, Liss is the first synthesis tool capable of handling realistic (albeit simplified) device driver code, while previous tools were evaluated on small fragments of driver code or on manually extracted synchronization skeletons.

Related work

This work is an extension of our work that appeared in CAV 2015 [4]. We included a proof for Theorem 3 that shows that language inclusion is undecidable for our particular construction of automata and independence relation. Further, we introduced a set of global mutex constraints that replace the greedy approach of our previous work and enables optimal lock placement according to an objective function.

Synthesis of synchronization is an active research area [3, 5, 6, 8, 12, 15, 17, 23, 24]. Closest to our work is a recent paper by Bloem et al. [3], which uses implicit specifications for synchronization synthesis. While their specification is given by sequential behaviors, ours is given by non-preemptive behaviors. This makes our approach applicable to scenarios where threads need to communicate explicitly. Further, correctness in Bloem et al. [3] is determined by comparing values at the end of the execution. In contrast, we compare sequences of events, which serves as a more suitable specification for infinitely-looping reactive systems. Further, Khoshnood et al. developed ConcBugAssist [18], similar to our earlier paper [15], that employs a greedy loop to fix assertion violations in concurrent programs.

Our previous work [5, 6, 15] develops the trace-based synthesis algorithm. The input is a program with assertions in the code, which represent an explicit correctness specification. The algorithm proceeds in a loop where in each iteration a faulty trace is obtained using an external model checker. A trace is faulty if it violates the specification. The trace is subsequently generalized to a partial order [5, 6] or a formula over happens-before relations [15], both representing a set of faulty traces. A formula over happens-before relations is basically a disjunction of partial orders. In our earlier previous work [5, 6] the partial order is used to synthesize atomic sections and inner-thread reorderings of independent statements. In our later work [15] the happens-before formula is used to obtain locks, wait-signal statements, and barriers. The quality of the synthesized code heavily depends on how well the generalization steps works. Intuitively the more faulty traces are removed in one synthesis step the more general the solution is and the closer it is to the solution a human would have implemented.

The drawback of assertions as a specification is that it is hard to determine if a given set of assertions represents a complete specification. The current work does not rely on an external model-checker or an explicit specification. Here we are solving language inclusion, a computationally harder problem than reachability. However, due to our abstraction, our tool performs significantly better than tools from our previous work [5, 6], which are based on a mature model checker (CBMC [10]). Our abstraction is reminiscent of previously used abstractions that track reads and writes to individual locations (e.g., [1, 25]). However, our abstraction is novel as it additionally tracks some control-flow information (specifically, the branches taken) giving us higher precision with almost negligible computational cost. For the trace generalization and synthesis we use the technique from our previous work [15] to infer looks. Due to our choice of specification no other synchronization primitives are needed.

In Vechev et al. [24] the authors rely on assertions for synchronization synthesis and include iterative abstraction refinement in their framework. This is an interesting extension to pursue for our abstraction. In other related work, CFix [17] can detect and fix concurrency bugs by identifying simple bug patterns in the code.

The concepts of linearizability and serializability are very similar to our implicit specification. Linearizability [16] describes the illusion that every method of an object takes effect instantaneously at some point between the method call and return. A set of transactions is serializable [13, 20] if they produce the same result, whether scheduled in parallel or in sequential order.

There has been a body of work on using a non-preemptive (cooperative) scheduler as an implicit specification. The notion of cooperability was introduced by Yi and Flanagan [26]. They require the user to annotate the program with yield statements to indicate thread interference. Then their system verifies that the yield specification is complete meaning that every trace is cooperable. A preemptive trace is cooperable if it is equivalent to a trace under the cooperative scheduler.

Illustrative example

Figure 2 contains our running example, a part of a device driver. A driver interfaces the operating system with the hardware device (as illustrated in Fig. 1) and may be used by different threads of the operating system in parallel. An operating system thread wishing to use the device must first call the open_dev procedure and finally the close_dev procedure to indicate it no longer needs the device. The driver keeps track of the number of threads that interact with the device. The first thread to call open_dev will cause the driver to power up the device, the last thread to call close_dev will cause the driver to power down the device. The interaction between the driver and the device are represented as procedure calls in lines $ℓ_{2}$ and $ℓ_{8}$ . From the device’s perspective, the power-on and power-off signals alternate. In general, we must assume that it is not safe to send the power-on signal twice in a row to the device. If executed with the non-preemptive scheduler the code in Fig. 2 will produce a sequence of a power-on signal followed by a power-off signal followed by a power-on signal and so on.

Fig. 1 — Interaction of the device driver with the OS and the device

Consider the case where the procedure open_dev is called in parallel by two operating system threads that want to initiate usage of the device. Without additional synchronization, there could be two calls to power_up in a row when executing under a preemptive scheduler. Consider two threads ( $T 1$ and $T 2$ ) running the open_dev procedure. The corresponding trace is $T 1 . ℓ_{1} ; T 2 . ℓ_{1} ; T 1 . ℓ_{2} ; T 2 . ℓ_{2} ; T 2 . ℓ_{3} ; T 2 . ℓ_{4} ; T 1 . ℓ_{3} ; T 1 . ℓ_{4}$ . This sequence is not observationally equivalent to any sequence that can be produced when executing with a non-preemptive scheduler.

Figure 3 contains the abstracted versions of the two procedures, open_dev_abs and close_dev_abs. For instance, the instruction $open : = open + 1$ is abstracted to the two instructions labeled $ℓ_{3 a}$ and $ℓ_{3 b}$ . The calls to the device (power_up and power_down) are abstracted as writes to a hypothetical $dev$ variable. This expresses the fact that interactions with the device are never independent. The abstraction is coarse, but still captures the problem. Consider two threads ( $T 1$ and $T 2$ ) running the open_dev_abs procedure. The following trace is possible under a preemptive scheduler, but not under a non-preemptive scheduler: $T 1 . ℓ_{1 a} ; T 1 . ℓ_{1 b} ; T 2 . ℓ_{1 a} ; T 2 . ℓ_{1 b} ; T 1 . ℓ_{2} ; T 2 . ℓ_{2} ; T 2 . ℓ_{3 a} ; T 2 . ℓ_{3 b} ; T 2 . ℓ_{4} ; T 1 . ℓ_{3 a} ; T 1 . ℓ_{3 b} ; T 1 . ℓ_{4}$ . Moreover, the trace cannot be transformed by swapping independent events into any trace possible under a non-preemptive scheduler. This is because instructions $ℓ_{3 b} : write (open)$ and $ℓ_{1 a} : read (open)$ are not independent. Further, $ℓ_{2} : write (dev)$ is not independent with itself. Hence, the abstract trace exhibits the problem of two successive calls to power_up when executing with a preemptive scheduler. Our synthesis procedure finds this problem, and stores it as a mutex constraint: $mtx ([ℓ_{1 a} : ℓ_{3 b}], [ℓ_{2} : ℓ_{3 b}])$ . Intuitively this constraint expresses the fact if one thread is executing any instruction between $ℓ_{1 a}$ and $ℓ_{3 b}$ no other thread may execute $ℓ_{2}$ or $ℓ_{3 b}$ .

Fig. 3 — Abstraction of the running example

While this constraint ensures two parallel calls to open_dev behave correctly, two parallel calls to close_dev may result in the device receiving two power_down signals. This is represented by the concrete trace $T 1 . ℓ_{5} ; T 1 . ℓ_{6} ; T 2 . ℓ_{5} ; T 2 . ℓ_{6} ; T 2 . ℓ_{7} ; T 2 . ℓ_{8} ; T 2 . ℓ_{9} ; T 1 . ℓ_{7} ; T 1 . ℓ_{8} ; T 1 . ℓ_{9}$ . The corresponding abstract trace is $T 1 . ℓ_{5 a} ; T 1 . ℓ_{5 b} ; T 1 . ℓ_{6 a} ; T 1 . ℓ_{6 b} ; T 2 . ℓ_{5 a} ; T 2 . ℓ_{5 b} ; T 2 . ℓ_{6 a} ; T 2 . ℓ_{6 b} ; T 2 . ℓ_{7 a} ; T 2 . ℓ_{7 b} ; T 2 . ℓ_{8} ; T 2 . ℓ_{9} ; T 1 . ℓ_{7 a} ; T 1 . ℓ_{7 b} ; T 1 . ℓ_{8} ; T 1 . ℓ_{9}$ . This trace is not possible under a non-preemptive scheduler and cannot be transformed to a trace possible under a non-preemptive scheduler. This results in a second mutex constraint $mtx ([ℓ_{5 a} : ℓ_{8}], [ℓ_{6 b} : ℓ_{8}])$ . With both mutex constraints the program is correct. Our lock placement procedure then encodes these constraints in SMT and the models of the SMT formula are all the correct lock placements. In Fig. 4 we show open_dev and close_dev with the inserted locks.

Fig. 4 — Running example with the synthesized locks

Formal framework and problem statement

We present the syntax and semantics of a concrete concurrent while language $W$ . For our solution strategy to be efficient we require an abstraction and we also introduce the syntax and semantics of the abstract concurrent while language $W_{abs}$ . While $W$ (and our tool) permits non-recursive function call and return statements, we skip these constructs in the formalization below. We conclude the section by formalizing our notion of correctness for concrete concurrent programs.

Concrete concurrent programs

In our work, we assume a read or a write to a single shared variable executes atomically and further assume a sequentially consistent memory model.

Syntax of $W$ (Fig. 5)

A concurrent program is a finite collection of threads $⟨ T 1, \dots, T n ⟩$ where each thread is a statement written in the syntax of $W$ . Variables in $W$ can be categorized into

shared variables $S h V a r_{i}$ ,
thread-local variables $L o V a r_{i}$ ,
lock variables $L k V a r_{i}$ ,
condition variables $C o n d V a r_{i}$ for wait-signal statements, and
guard variables $G r d V a r_{i}$ for assumptions.

The $L k V a r_{i}, C o n d V a r_{i}$ and $G r d V a r_{i}$ variables are also shared between all threads. All variables range over integers with the exception of guard variables that range over Booleans ( $true, false$ ). Each statement is labeled with a unique location identifier $ℓ$ ; we denote by $stmt (ℓ)$ the statement labeled by $ℓ$ .

The language $W$ includes standard sequential constructs, such as assignments, loops, conditionals, and $goto$ statements. Additional statements control the interaction between threads, such as lock, wait-notify, and $yield$ statements. In $W$ , we only permit expressions that read from at most one shared variable and assignments that either read from or write to exactly one shared variable.1 The language also includes $assume$ , $assume_not$ statements that operate on guard variables and become relevant later for our abstraction. The $yield$ statement is in a sense an annotation as it has no effect on the actual program running under a preemptive scheduler. We still present it here because it has a semantic meaning under the non-preemptive scheduler.

Language $W$ has two statements that allow communication with an external system: $input (c h)$ reads from and $output (c h, S h E x p)$ writes to a communication channel $c h$ . The channel is an interface between the program and an external system. The external system cannot observe the internal state of the program and only observes the information flow on the channel. In practice, we use the channels to model device registers. A device register is a special memory address, reading and writing from and to it is visible to the device. This is used to exchange information with a device. In our presentation, we assume all channels communicate with the same external system.

Semantics of $W$

We first define the semantics of a single thread in $W$ , and then extend the definition to concurrent non-preemptive and preemptive semantics.

4.1.2.1 Single-thread semantics (Fig. 6)

Let us fix a thread identifier $t i d$ . We use $t i d$ interchangeably with the program it represents. A state of a single thread is given by $⟨ V, ℓ ⟩$ where $V$ is a valuation of all program variables, and $ℓ$ is a location identifier, indicating the statement in $t i d$ to be executed next. A thread is guaranteed not to read or write thread-local variables of other threads.

We define the flow graph $G_{t i d}$ for thread $t i d$ in a manner similar to the control-flow graph of $t i d$ . Every node of $G_{t i d}$ represents a single statement (basic blocks are not merged) and the node is labeled with the location $ℓ$ of the statement. The flow graph $G_{t i d}$ has a unique entry node and a unique exit node. These two may coincide if the thread has no statements. The entry node is the first labeled statement in $t i d$ ; we denote its location identifier by ${first}_{t i d}$ . The exit node is a special node corresponding to a hypothetical statement ${last}_{t i d} : skip$ placed at the end of $t i d$ .

We define successors of locations of $t i d$ using $G_{t i d}$ . The location last has no successors. We define $succ (ℓ) = ℓ^{'}$ if node $ℓ : s t m t$ in $G_{t i d}$ has exactly one outgoing edge to node $ℓ^{'} : s t m t^{'}$ . Nodes representing conditionals and loops have two outgoing edges. We define ${succ}_{1} (ℓ) = ℓ_{1}$ and ${succ}_{2} (ℓ) = ℓ_{2}$ if node $ℓ : s t m t$ in $G_{t i d}$ has exactly two outgoing edges to nodes $ℓ_{1} : s t m t_{1}$ and $ℓ_{2} : s t m t_{2}$ . Here ${succ}_{1}$ represents the $then$ or the $loop$ branch, whereas ${succ}_{2}$ represents the $else$ or the $loopexit$ branch.

We can now define the single-thread operational semantics. A single execution step $⟨ V, ℓ ⟩ \overset{α}{\to} ⟨ V^{'}, ℓ^{'} ⟩$ changes the program state from $⟨ V, ℓ ⟩$ to $⟨ V^{'}, ℓ^{'} ⟩$ , while optionally outputting an observable symbol $α$ . The absence of a symbol is denoted using $ϵ$ . In the following, $e$ represents an expression and $e [v / V [v]]$ evaluates an expression by replacing all variables v with their values in $V$ . We use $V [v : = k]$ to denote that variable v is set to k and all other variables in $V$ remain unchanged.

In Fig. 6, we present the rules for single execution steps. Each step is atomic, no interference can occur while the expressions in the premise are being evaluated. The only rules with an observable output are:

Havoc: Statement $ℓ : S h V a r : = havoc$ assigns shared variable $S h V a r$ a non-deterministic value (say k) and outputs the observable $(t i d, havoc, k, S h V a r)$ .
Input, Output: $ℓ : S h V a r : = input (c h)$ and $ℓ : output (c h, S h E x p)$ read and write values to the channel $c h$ , and output $(t i d, in, k, c h)$ and $(t i d, out, k, c h)$ , where k is the value read or written, respectively.

Intuitively, the observables record the sequence of non-deterministic guesses, as well as the input/output interaction with the tagged channels. The semantics of the synchronization statements shown in Fig. 6 is standard. The lock and unlock statements do not count and do not allow double (un)locking. There are no rules for $goto$ and the sequence statement because they are already taken care of by the flow graph.

Concurrent semantics

A state of a concurrent program is given by $⟨ V, ctid, (ℓ_{1}, \dots, ℓ_{n}) ⟩$ where $V$ is a valuation of all program variables, $ctid$ is the thread identifier of the currently executing thread and $ℓ_{1}, \dots, ℓ_{n}$ are the locations of the statements to be executed next in threads $T_{1}$ to $T_{n}$ , respectively. There are two additional states: $⟨ terminated ⟩$ indicates the program has finished and $⟨ failed ⟩$ indicates an assumption failed. Initially, all integer program variables and $ctid$ equal 0, all guard variable equal $false$ and for each $i \in [1, n] : ℓ_{i} = {first}_{i}$ . We introduce a non-preemptive and a preemptive semantics. The former is used as a specification of allowed executions, whereas the latter models concurrent sequentially consistent executions of the program.

4.1.3.1 Non-preemptive semantics (Fig. 7 ) The non-preemptive semantics ensures that a single thread from the program keeps executing using the single-thread semantics (Rule Seq) until one of the following occurs: (a) the thread finishes execution (Rule Thread_end) or (b) it encounters a $yield$ , $lock$ , $wait$ or $wait_not$ statement (Rule Nswitch). In these cases, a context-switch is possible, however, the new thread must not be blocked. We consider a thread blocked if its current instruction is to acquire an unavailable lock, waits for a condition that is not signaled, or the thread reached the $last$ location. Note the difference between $wait$ / $wait_not$ and $assume$ / $assume_not$ . The former allow for a context-switch while the latter transitions to the $⟨ failed ⟩$ state if the assume is not fulfilled (rule Assume/Assume_not). A special rule exists for termination (Rule Terminate), which requires that all threads finished execution and also all locks are unlocked.

4.1.3.2 Preemptive semantics (Figs. 7, 8 ) The preemptive semantics of a program is obtained from the non-preemptive semantics by relaxing the condition on context-switches, and allowing context-switches at all program points. In particular, the preemptive semantics consist of the rules of the non-preemptive semantics and the single rule Pswitch in Fig. 8.

Fig. 8 — Additional rule for preemptive semantics

Abstract concurrent programs

The state of the concrete semantics contains unbounded integer variables, which may result in an infinite state space. We therefore introduce a simple, data-oblivious abstraction $W_{abs}$ for concurrent programs written in $W$ communicating with an external system. The abstraction tracks types of accesses (read or write) to each memory location while abstracting away their values. Inputs/outputs to a channel are modeled as writes to a special memory location ( $dev$ ). Even inputs are modeled as writes because in our applications we cannot assume that reads from the external interface are free of side-effects in the component on the other side of the interface. Havocs become ordinary writes to the variable they are assigned to. Every branch is taken non-deterministically and tracked. Given $C$ written in $W$ , we denote by $C_{abs}$ the corresponding abstract program written in $W_{abs}$ .

Abstract syntax (Fig. 9)

In the figure, $v a r$ denotes all shared program variables and the $dev$ variable. The syntax of all synchronization primitives and the assumptions over guard variables remains unchanged. The purpose of the guard variables is to improve the precision of our otherwise coarse abstraction. Currently, they are inferred manually, but can presumably be inferred automatically using an iterative abstraction-refinement loop. In our current benchmarks, guard variables needed to be introduced in only three scenarios.

Abstraction function (Fig. 10)

A thread in $W$ can be translated to $W_{abs}$ using the abstraction function Inline graphic . The abstraction replaces all global variable access with $read (v a r)$ and $write (v a r)$ and replaces branching conditions with nondeterminism ( $*$ ). All synchronization primitives remain unaffected by the abstraction. The abstraction may result in duplicate labels $ℓ$ , which are replaced by fresh labels. $goto$ statements are reordered accordingly. Our abstraction records branching choices (branch tagging). If one were to remove branch-tagging, the abstraction would be unsound. The justification and intuition for this can be found further below in Theorem 1. For example in our running example in Fig. 2 the abstraction of $ℓ_{1}$ results in two abstract labels $ℓ_{1 a}$ and $ℓ_{1 b}$ in Fig. 3.

Abstract semantics

As before, we first define the semantics of $W_{abs}$ for a single-thread.

4.2.3.1 Single-thread semantics (Fig. 11)

The abstract state of a single thread $t i d$ is given simply by $⟨ V_{o}, ℓ ⟩$ where $V_{o}$ is a valuation of all lock, condition and guard variables and $ℓ$ is the location of the statement in $t i d$ to be executed next. We define the flow graph and successors for locations in the abstract program $t i d$ in the same way as before. An abstract observable symbol is of the form: $(t i d, θ, ℓ)$ , where $θ \in {(read, S h V a r), (write, S h V a r), then, else, loop, exitloop}$ . The symbol $θ$ records the type of access to variables along with the variable name $((read, v), (write, v))$ and records non-deterministic branching choices ${if, else, loop, exitloop}$ . Fig. 11 presents the rules for statements unique to $W_{abs}$ ; the rules for statements common to $W_{abs}$ and $W$ are the same.

4.2.3.2 Concurrent semantics

A state of an abstract concurrent program is either $⟨ terminated ⟩, ⟨ failed ⟩$ , or is given by $⟨ V_{o}, ctid, (ℓ_{1}, \dots, ℓ_{n}) ⟩$ where $V_{o}$ is a valuation of all lock, condition and guard variables, $ctid$ is the current thread identifier and $ℓ_{1}, \dots, ℓ_{n}$ are the locations of the statements to be executed next in threads $T_{1}$ to $T_{n}$ , respectively. The non-preemptive and preemptive semantics of a concurrent program written in $W_{abs}$ are defined in the same way as that of a concurrent program written in $W$ .

Program correctness and problem statement

Let $W, W_{abs}$ denote the set of all concurrent programs in $W, W_{abs}$ , respectively.

Executions

A non-preemptive/preemptive execution of a concurrent program $C$ in $W$ is an alternating sequence of program states and (possibly empty) observable symbols, $S_{0} α_{1} S_{1} \dots α_{k} S_{k}$ , such that (a) $S_{0}$ is the initial state of $C$ , (b) $\forall j \in [0, k - 1]$ , according to the non-preemptive/preemptive semantics of $W$ , we have $S_{j} \overset{α_{j + 1}}{\to} S_{j + 1}$ , and (c) $S_{k}$ is the state $⟨ terminated ⟩$ . A non-preemptive/preemptive execution of a concurrent program $C_{abs}$ in $W_{abs}$ is defined in the same way, replacing the corresponding semantics of $W$ with that of $W_{abs}$ .

Observable behaviors

Let $π$ be an execution of program $C$ in $W$ , then we denote with $ω = obs (π)$ the sequence of non-empty observable symbols in $π$ . We use ${[[C]]}^{NP}$ , resp. ${[[C]]}^{P}$ , to denote the non-preemptive, resp. preemptive, observable behavior of $C$ , that is all sequences $obs (π)$ of all executions $π$ under the non-preemptive, resp. preemptive, scheduling. The non-preemptive/preemptive observable behavior of program $C_{abs}$ in $W_{abs}$ , denoted ${[[C_{abs}]]}^{NP}$ / ${[[C_{abs}]]}^{P}$ , is defined similarly.

We specify correctness of concurrent programs in $W$ using two implicit criteria, presented below.

Preemption-safety

Observable behaviors $ω_{1}$ and $ω_{2}$ of a program $C$ in $W$ are equivalent if: (a) the subsequences of $ω_{1}$ and $ω_{2}$ containing only symbols of the form $(t i d, in, k, t)$ and $(t i d, out, k, t)$ are equal and (b) for each thread identifier $t i d$ , the subsequences of $ω_{1}$ and $ω_{2}$ containing only symbols of the form $(t i d, havoc, k, x)$ are equal. Intuitively, observable behaviors are equivalent if they have the same interaction with the interface, and the same non-deterministic choices in each thread. For sets $O_{1}$ and $O_{2}$ of observable behaviors, we write $O_{1} ⋐ O_{2}$ to denote that each sequence in $O_{1}$ has an equivalent sequence in $O_{2}$ .

Given concurrent programs $C$ and $C^{'}$ in $W$ such that $C^{'}$ is obtained by adding locks to $C, C^{'}$ is preemption-safe w.r.t. $C$ if ${[[C^{'}]]}^{P} ⋐ {[[C]]}^{NP}$ .

Deadlock-freedom

A state $S$ of concurrent program $C$ in $W$ is a deadlock state under non-preemptive/preemptive semantics if

The repeated application of the rules of the non-preemptive/preemptive semantics from the initial state $S_{0}$ of $C$ can lead to $S$ ,
$S \neq ⟨ terminated ⟩$ ,
$S \neq ⟨ failed ⟩$ , and
$\neg \exists S^{'}$ : $⟨ S ⟩ \overset{α}{\to} ⟨ S^{'} ⟩$ according to the non-preemptive/preemptive semantics of $W$ .

Program $C$ in $W$ is deadlock-free under non-preemptive/preemptive semantics if no non-preemptive/preemptive execution of $C$ hits a deadlock state. In other words, every non-preemptive/preemptive execution of $C$ ends in state $⟨ terminated ⟩$ or $⟨ failed ⟩$ . The $⟨ failed ⟩$ state indicates an assumption did not hold, which we do not consider a deadlock. We say $C$ is deadlock-free if it is deadlock-free under both non-preemptive and preemptive semantics.

Problem statement

We are now ready to state our main problem, the optimal synchronization synthesis problem. We assume we are given a cost function f from a program $C^{'}$ to the cost of the lock placement solution, formally $f : W \mapsto R$ . Then, given a concurrent program $C$ in $W$ , the goal is to synthesize a new concurrent program $C^{'}$ in $W$ such that:

$C^{'}$ is obtained by adding locks to $C$ ,
$C^{'}$ is preemption-safe w.r.t. $C$ ,
$C^{'}$ has no deadlocks not present in $C$ , and,
$C^{'} = \underset{C^{''} \in W satisfying (a)-(c) above}{arg min} f (C^{''})$

Solution overview

Our solution framework (Fig. 12) consists of the following main components. We briefly describe each component below and then present them in more detail in subsequent sections.

Reduction of preemption-safety to language inclusion

To ensure tractability of checking preemption-safety, we build the abstract program $C_{abs}$ from $C$ using the abstraction function described in Sect. 4.2. Under abstraction, we model each thread as a nondeterministic finite automaton (NFA) over a finite alphabet consisting of abstract observable symbols. This enables us to construct NFAs ${NP}_{abs}$ and $P_{abs}^{'}$ accepting the languages ${[[C_{abs}]]}^{NP}$ and ${[[C_{abs}^{'}]]}^{P}$ , respectively. We proceed to check if all words of $P_{abs}^{'}$ are included in ${NP}_{abs}$ modulo an independence relation I that respects the equivalence of observables. We describe the reduction of preemption-safety to language inclusion and our language inclusion check procedure in Sect. 6.

Inference of mutex constraints from generalized counterexamples

If $P_{abs}^{'}$ and ${NP}_{abs}$ do not satisfy language inclusion modulo I, then we obtain a counterexample $cex$ . A counterexample is a sequence of locations an observation sequence that is in ${[[C_{abs}]]}^{P}$ , but not in ${[[C_{abs}^{'}]]}^{NP}$ . We analyze $cex$ to infer constraints on $L (P_{abs}^{'})$ for eliminating $cex$ . We use $nhood (cex)$ to denote the set of all permutations of the symbols in $cex$ that are accepted by $P_{abs}^{'}$ . Our counterexample analysis examines the set $nhood (cex)$ to obtain an hbformula $ϕ$ —a Boolean combination of happens-before ordering constraints between events—representing all counterexamples in $nhood (cex)$ . Thus $cex$ is generalized into a larger set of counterexamples represented as $ϕ$ . From $ϕ$ , we infer possible mutual exclusion (mutex) constraints on $L (P_{abs}^{'})$ that can eliminate all counterexamples satisfying $ϕ$ . We describe the procedure for finding constraints from $cex$ in Sect. 7.1.

Automaton modification for enforcing mutex constraints

Once we have the mutex constraints inferred from a generalized counterexample, we enforce them in $P_{abs}^{'}$ , effectively removing transitions from the automaton that violate the mutex constraint. This completes our loop and we repeat the language inclusion check of $P_{abs}^{'}$ and ${NP}_{abs}$ . If another counterexample is found our loop continues, if the language inclusion check succeeds we proceed to the lock placement. This differs from the greedy approach employed in our previous work [4] that modifies $C_{abs}^{'}$ and then constructs a new automaton $P_{abs}^{'}$ from $C_{abs}^{'}$ before restarting the language inclusion. The greedy approach inserts locks into $C_{abs}^{'}$ that are never removed in a future iteration. This can lead to inefficient lock placement. For example a larger lock may be placed that completely surrounds an earlier placed lock.

Computation of an f-optimal lock placement

Once $P_{abs}^{'}$ and ${NP}_{abs}$ satisfy language inclusion modulo I, we formulate global constraints over lock placements for ensuring correctness. These global constraints include all mutex constraints inferred over all iterations and constraints for enforcing deadlock-freedom. Any model of the global constraints corresponds to a lock placement that ensures program correctness. We describe the formulation of these global constraints in Sect. 8.

Given a cost function f, we compute a lock placement that satisfies the global constraints and is optimal w.r.t. f. We then synthesize the final output $C^{'}$ by inserting the computed lock placement in $C$ . We present various objective functions and describe the computation of their respective optimal solutions in Sect. 9.

Checking preemption-safety

Reduction of preemption-safety to language inclusion

Soundness of the abstraction

Formally, two observable behaviors $ω_{1} = α_{0} \dots α_{k}$ and $ω_{2} = β_{0} \dots β_{k}$ of an abstract program $C_{abs}$ in $W_{abs}$ are equivalent if:

For each thread $t i d$ , the subsequences of $α_{0} \dots α_{k}$ and $β_{0} \dots β_{k}$ containing only symbols of the form $(t i d, a, ℓ)$ , for all a, are equal,
For each variable var, the subsequences of $α_{0} \dots α_{k}$ and $β_{0} \dots β_{k}$ containing only write symbols (of the form $(t i d, (write, v a r), ℓ)$ ) are equal, and
For each variable var, the multisets of symbols of the form $(t i d, (read, v a r), ℓ)$ between any two write symbols, as well as before the first write symbol and after the last write symbol are identical.

Using this notion of equivalence, the notion of preemption-safety is extended to abstract programs: Given abstract concurrent programs $C_{a b s}$ and $C_{a b s}^{'}$ in $W_{a b s}$ such that $C_{a b s}^{'}$ is obtained by adding locks to $C_{a b s}, C_{a b s}^{'}$ is preemption-safe w.r.t. $C_{a b s}$ if ${[[C_{a b s}^{'}]]}^{P} ⋐_{a b s} {[[C_{a b s}]]}^{NP}$ .

For the abstraction to be sound we require only that whenever preemption-safety does not hold for a program $C$ , then there must be a trace in its abstraction $C_{abs}$ feasible under preemptive, but not under non-preemptive semantics.

To illustrate this we use the program in Fig. 13, which is not preemption-safe. To see this consider the observation $(T 1, out, 10, ch)$ that cannot occur in the non-preemptive semantics because x is always 0 at $ℓ_{4}$ . Note that $ℓ_{3}$ is unreachable because the variable y is initialized to 0 and never assigned. With the preemptive semantics the output can be observed if thread $T 2$ interrupts thread $T 1$ between lines $ℓ_{1}$ and $ℓ_{4}$ . An example trace would be $ℓ_{1} ; ℓ_{6} ; ℓ_{2} ; ℓ_{4} ; ℓ_{5}$ .

If we consider the abstract semantics, we notice that under the non-preemptive abstract semantics $ℓ_{3}$ is reachable because the abstraction makes the branching condition in $ℓ_{2}$ non-deterministic. However, since our abstraction is sound there must still be an observation sequence that is observable under the abstract preemptive semantics, but not under the abstract non-preemptive semantics. This observation sequence is $(T 1, (write, x), ℓ_{1}), (T 2, (write, x), ℓ_{6}), (T 1, (read, y), ℓ_{2}), (T 1, else, ℓ_{2}), (T 1, (read, x), ℓ_{4}), (T 1, then, ℓ_{2}), (T 1, (write, dev), ℓ_{5})$ . The branch tagging records that the else branch is taken in $ℓ_{2}$ . The non-preemptive semantics cannot produce this observation sequences because it must also take the $else$ branch in $ℓ_{2}$ and can therefore not reach the $yield$ statement and context-switch. As a site note, it is also not possible to transform this observation sequence into an equivalent one under the non-preemptive semantics because of the write to $x$ at $ℓ_{6}$ and the accesses to $x$ in $ℓ_{1}$ and $ℓ_{4}$ .

This example illustrates why branch tagging is crucial to soundness of the abstraction. If we assume a hypothetical abstract semantics without branch tagging we would get the following preemptive observation sequence: $(T 1, (write, x), ℓ_{1}), (T 2, (write, x), ℓ_{6}), (T 1, (read, y), ℓ_{2}), (T 1, (read, x), ℓ_{4}), (T 1, (write, dev), ℓ_{5})$ . This sequence would also be a valid observation sequence under the non-preemptive semantics, because it could take the $then$ branch in $ℓ_{2}$ and reach the $yield$ statement and context-switch.

Theorem 1

(soundness) Given concurrent program $C$ and a synthesized program $C^{'}$ obtained by adding locks to $C, {[[C_{a b s}^{'}]]}^{P} ⋐_{a b s} {[[C_{a b s}]]}^{NP} \Rightarrow {[[C^{'}]]}^{P} ⋐ {[[C]]}^{NP}$ .

Proof

It is easier to prove the contrapositive: Inline graphic .

Inline graphic means that there is an observation sequence $ω^{'}$ of ${[[C^{'}]]}^{P}$ with no equivalent observation sequence in ${[[C]]}^{NP}$ . We now show that the abstract sequence $ω_{a b s}^{'}$ in ${[[C_{a b s}^{'}]]}^{P}$ corresponding to the sequence $ω^{'}$ has no equivalent sequence in ${[[C_{a b s}]]}^{NP}$ .

Towards contradiction we assume there is such an equivalent sequence $ω_{a b s}$ in ${[[C_{a b s}]]}^{NP}$ . We show that if $ω_{a b s}$ indeed existed it would correspond to a concrete sequence $ω$ that is equivalent to $ω^{'}$ , thereby contradicting our assumption.

By (A1) $ω_{a b s}$ would have the same control flow as $ω_{a b s}^{'}$ because of the branch tagging. By (A2) and (A3) $ω_{a b s}$ would have the same data-flow, meaning all reads from global variables are reading the values written by the same writes as in $ω_{a b s}^{'}$ . Since all interactions with the environment are abstracted to $write (dev)$ the order of interactions must be the same between $ω_{a b s}$ and $ω_{a b s}^{'}$ . This means that, assuming all inputs and havocs are returning the same value, in the execution $ω$ corresponding to $ω_{a b s}$ all variables valuation are identical to those in $ω^{'}$ . Therefore, $ω$ is feasible and its interaction with the environment is identical to $ω^{'}$ as all variable valuations are identical. Identical interaction with the environment is how equivalence between $ω$ and $ω^{'}$ is defined. This concludes our proof. $□$

Language inclusion modulo an independence relation

We define the problem of language inclusion modulo an independence relation. Let I be a non-reflexive, symmetric binary relation over an alphabet $Σ$ . We refer to I as the independence relation and to elements of I as independent symbol pairs. We define a symmetric binary relation $\approx_{I}$ over words in $Σ^{*}$ : for all words $σ, σ^{'} \in Σ^{*}$ and $(α, β) \in I, (σ \cdot α β \cdot σ^{'}, σ \cdot β α \cdot σ^{'}) \in \approx_{I}$ . Let $\approx_{I}^{t}$ denote the reflexive transitive closure of $\approx_{I}$ .2 Given a language $L$ over $Σ$ , the closure of $L$ w.r.t. I, denoted ${Clo}_{I} (L)$ , is the set ${σ \in Σ^{*} : \exists σ^{'} \in L with (σ, σ^{'}) \in \approx_{I}^{t}}$ . Thus, ${Clo}_{I} (L)$ consists of all words that can be obtained from some word in $L$ by repeatedly commuting adjacent independent symbol pairs from I.

Definition 1

(Language inclusion modulo an independence relation) Given NFAs A, B over a common alphabet $Σ$ and an independence relation I over $Σ$ , the language inclusion problem modulo I is: $L (A) \subseteq {Clo}_{I} (L (B))$ ?

Data independence relation

We define the data independence relation $I_{D}$ over our observable symbols. Two symbols $α = (t i d_{α}, a_{α}, ℓ_{α})$ and $β = (t i d_{β}, a_{β}, ℓ_{β})$ are independent, $(α, β) \in I_{D}$ , iff (I0) $t i d_{α} \neq t i d_{β}$ and one of the following hold:

$a_{α}$ or $a_{β}$ in ${then, else, loop, loopexit}$
$a_{α}$ and $a_{β}$ are both $(read, v a r)$
$a_{α}$ is in ${(write, v a r_{α}), (read, v a r_{α})}$ and $a_{β}$ is in ${(write, v a r_{β}), (read, v a r_{β})}$ and $v a r_{α} \neq v a r_{β}$

Checking preemption-safety

Under abstraction, we model each thread as a nondeterministic finite automaton (NFA) over a finite alphabet consisting of abstract observable symbols. This enables us to construct NFAs ${NP}_{abs}$ and $P_{abs}^{'}$ accepting the languages ${[[C_{abs}]]}^{NP}$ and ${[[C_{abs}^{'}]]}^{P}$ , respectively. $C_{abs}$ is the abstract program corresponding to the input program $C$ and $C_{abs}^{'}$ is the program corresponding to the result of the synthesis $C^{'}$ . It turns out that preemption-safety of $C^{'}$ w.r.t. $C$ is implied by preemption-safety of $C_{abs}^{'}$ w.r.t. $C_{abs}$ , which, in turn, is implied by language inclusion modulo $I_{D}$ of NFAs $P_{abs}^{'}$ and ${NP}_{abs}$ . NFAs $P_{abs}^{'}$ and ${NP}_{abs}$ satisfy language inclusion modulo $I_{D}$ if any word accepted by $P_{abs}^{'}$ is equivalent to some word obtainable by repeatedly commuting adjacent independent symbol pairs in a word accepted by ${NP}_{abs}$ .

Proposition 1

Given concurrent programs $C$ and $C^{'}, {[[C_{a b s}^{'}]]}^{P} ⋐_{a b s} {[[C_{a b s}]]}^{NP}$ iff $L (P_{abs}^{'}) \subseteq {Clo}_{I_{D}} (L ({NP}_{abs}))$ .

Proof

By construction $P_{abs}^{'}$ , resp. ${NP}_{abs}$ , accept exactly the observation sequences that $C_{a b s}^{'}$ , resp. $C_{a b s}$ , may produce under the preemptive, resp. non-preemptive, semantics (denoted by ${[[C_{a b s}^{'}]]}^{P}$ , resp. ${[[C_{a b s}]]}^{NP}$ ). It remains to show that two observation sequences $ω_{1} = α_{0} \dots α_{k}$ and $ω_{2} = β_{0} \dots β_{k}$ are equivalent iff $ω_{1} \in {Clo}_{I_{D}} ({ω_{2}})$ .

We first show that $ω_{1} \in {Clo}_{I_{D}} ({ω_{2}})$ implies $ω_{1}$ is equivalent to $ω_{2}$ . The proof proceeds by induction: The base case is that no symbols are swapped and is trivially true. The inductive case assumes that $ω^{'}$ is equivalent to $ω_{2}$ and we needs to show that after one single swap operation in $ω^{'}$ , resulting in $ω^{''}, ω^{'}$ is equivalent to $ω^{''}$ and therefore by transitivity also equivalent to $ω_{2}$ . Rule (A1) holds because $I_{D}$ does not allow symbols of the same thread to be swapped (I0). To prove (A2) we use the fact that writes to the same variable cannot be swapped (I2), (I3). To prove (A3) we use the fact that reads and writes to the same variable are not independent (I2), (I3).

It remains to show that $ω_{1}$ is equivalent to $ω_{2}$ implies $ω_{1} \in {Clo}_{I_{D}} ({ω_{2}})$ . Clearly $ω_{1}$ and $ω_{2}$ consist of the same multiset of symbols (A1). Therefore it is possible to transform $ω_{2}$ into $ω_{1}$ by swapping adjacent symbols. It remains to show that all swaps involve independent symbols. By (A1) the order of events in each thread does not change, therefore condition (I0) is always fulfilled. Branch tags can swap with every other symbol (I1) and accesses to different variables can swap with each other (I3). For each variables $S h V a r$ (A2) ensures that writes are in the same order and (A3) allows reads in between to be reordered. These swaps are allowed by (I2). No other swaps can occur. $□$

Checking language inclusion

We first focus on the problem of language inclusion modulo an independence relation (Definition 1). This question corresponds to preemption-safety (Theorem 1, Proposition 1) and its solution drives our synchronization synthesis.

Theorem 2

For NFAs A, B over alphabet $Σ$ and a symmetric, irreflexive independence relation $I \subseteq Σ \times Σ$ , the problem $L (A) \subseteq {Clo}_{I} (L (B))$ is undecidable [2].

We now show that this general undecidability result extends to our specific NFAs and independence relation $I_{D}$ .

Theorem 3

For NFAs $P_{abs}^{'}$ and ${NP}_{abs}$ constructed from $C_{abs}$ , the problem $L (P_{abs}^{'}) \subseteq {Clo}_{I_{D}} (L ({NP}_{abs}))$ is undecidable.

Proof

Our proof is by reduction from the language inclusion modulo an independence relation problem (Definition 1). Theorem 3 follows from the undecidability of this problem (Theorem 2).

Assume we are given NFAs $A = (Q_{A}, Σ, Δ_{A}, Q_{ι, A}, F_{A})$ and $B = (Q_{B}, Σ, Δ_{B}, Q_{ι, B}, F_{B})$ and an independence relation $I \subseteq Σ \times Σ$ . Without loss of generality we assume $A$ and $B$ to be deterministic, complete, and free of $ϵ$ -transitions, meaning from every state there is exactly one transition for each symbol. We show that we can construct a program $C_{abs}$ that is preemption-safe iff $L (A) \subseteq {Clo}_{I} (L (B))$ .

For our reduction we construct a program $C_{abs}$ that simulates $A$ or $B$ if run with a preemptive scheduler and simulates only $B$ if run with a non-preemptive scheduler. Note that $L (A) \cup L (B) \subseteq {Clo}_{I} (L (B))$ iff $L (A) \subseteq {Clo}_{I} (L (B))$ . For every symbol $α \in Σ$ our simulator produces a sequence $ω_{α}$ of abstract observable symbols. We say two such sequences $ω_{α}$ and $ω_{β}$ commute if $ω_{α} \cdot ω_{β} \approx_{I_{D}}^{t} ω_{β} \cdot ω_{α}$ , i.e, if $ω_{β} \cdot ω_{α}$ can be obtained from $ω_{α} \cdot ω_{β}$ by repeatedly swapping adjacent symbol pairs in $I_{D}$ .

We will show that (a) $C_{abs}$ simulates $A$ or $B$ if run with a preemptive scheduler and simulates only $B$ if run with a non-preemptive scheduler, and (b) sequences $ω_{α}$ and $ω_{β}$ commute iff $(α, β) \in I$ .

The simulator is shown in Fig. 14. States and symbols of $A$ and $B$ are mapped to natural numbers and represented as bitvectors to enable simulation using the language $W_{abs}$ . In particular we use Boolean guard variables from $W_{abs}$ to represent the bitvectors. We use $true$ to represent 1 and $false$ to represent 0. As the state space and the alphabet are finite we know the number of bits needed a priori. We use n, m, and p for the number of bits needed to represent $Q_{A}, Q_{B}$ , and $Σ$ , respectively. The transition functions $Δ_{A}$ and $Δ_{B}$ likewise work on the individual bits. We represent bitvector x of length n as $x^{1} \dots x^{n}$ .

Thread $T 1$ simulates both automata A and B simultaneously. We assume the initial states of $A$ and $B$ are mapped to the number 0. In each iteration of the loop in thread $T 1$ a symbol $α \in Σ$ is chosen non-deterministically and applied to both automata (we discuss this step in the next paragraph). Whether thread $T 1$ simulates $A$ or $B$ is decided only in the end: depending on the value of $simA$ we assert that a final state of $A$ or $B$ was reached. The value of $simA$ is assigned in thread $T 2$ and can only be $true$ if $T 2$ is preempted between locations $ℓ_{12}$ and $ℓ_{13}$ . With the non-preemptive scheduler the variable $simA$ will always be $false$ because thread $T 2$ cannot be preempted. The simulator can only reach the $⟨ terminated ⟩$ state if all assumptions hold as otherwise it would end in the $⟨ failed ⟩$ state. The guard $final$ will only be assigned $true$ in $ℓ_{10}$ if either $simA$ is $false$ and a final state of $B$ has been reached or if $simA$ is $true$ and a final state of $A$ has been reached. Therefore the valid non-preemptive executions can only simulate $B$ . In the preemptive setting the simulator can simulate either $A$ or $B$ because $simA$ can be either $true$ or $false$ . Note that the statement in location $ℓ_{10}$ executes atomically and the value of $simA$ cannot change during its evaluation. This means that $P_{abs}^{'}$ simulates $L (A) \cup L (B)$ and ${NP}_{abs}$ simulates $L (B)$ .

We use $τ$ to store the symbol used by the transition function. The choice of the next symbol needs to be non-deterministic to enable simulation of $A, B$ and there is no havoc statement in $W_{abs}$ . We therefore use the fact that the next thread to execute is chosen non-deterministically at a preemption point. We define a thread $T_{α}$ for every $α \in Σ$ that assigns to $τ$ the number $α$ maps to. Threads $T_{α}$ can only run if the conditional variable ch-sym is set to 1 by the $notify$ statement in $ℓ_{2}$ . The Inline graphic in $ℓ_{3}$ is a preemption point for the non-preemptive semantics. Then, exactly one thread $T_{α}$ can proceed because the statement in $ℓ_{15}$ atomically resets ch-sym to 0. After setting $τ$ and outputting the representation of $α$ thread $T_{α}$ , notifies thread $T 1$ using condition variable ch-sym-compl. Another symbol can only be produced in the next loop iteration of $T 1$ .

To produce an observable sequence faithful to I for each symbol in $Σ$ we define a homomorphism h that maps symbols from $Σ$ to sequences of observables. Assuming the symbol $α \in Σ$ is chosen, we produce the following observables:

Loop tag To output $α$ the thread $T_{α}$ has to perform one loop iteration. This implicitly produces a loop tag $(T_{α}, loop, ℓ_{14})$ .
Conflict variables For each pair of $(α, α_{i}) \notin I$ , we define a conflict variable $v_{{α, α_{i}}}$ . Note that $v_{{α, α_{i}}} = v_{{α_{i}, α}}$ and two writes to $v_{{α, α_{i}}}$ do not commute under $I_{D}$ . For each $α_{i}$ , we produce a tag $(T_{α}, (write, v_{{α, α_{i}}}, ℓ_{o i}))$ . Therefore if two variables $α_{1}$ and $α_{2}$ are dependent the observation sequences produced for each of them will contain a write to $v_{{α_{1}, α_{2}}}$ .

Formally, the homomorphism h is given by $h (α) = (T_{α}, loop, ℓ_{14}) ; (T_{α}, (write, v_{{α, α_{1}}}), ℓ_{o 1}) ; \dots ; (T_{α}, (write, v_{{α, α_{k}}}), ℓ_{o k})$ . For a sequence $σ = α_{1} \dots α_{n}$ use define $h (σ) = h (α_{1}) \dots h (α_{n})$ .

We show that $(α_{1}, α_{2}) \in I$ iff $h (α_{1})$ and $h (α_{2})$ commute. The loop tags are independent iff $α_{1} \neq α_{2}$ . If $α_{1} = α_{2}$ then $(α_{1}, α_{2}) \notin I$ and $h (α_{1})$ and $h (α_{2})$ do not commute due to the loop tags. Assuming $(α_{1}, α_{2}) \in I$ then $h (α_{1})$ and $h (α_{2})$ commute because they have no common conflict variable they write to. On the other hand, if $(α_{1}, α_{2}) \notin I$ , then both $h (α_{1})$ and $h (α_{2})$ will contain $(T_{α_{{1, 2}}}, (write, v_{{α_{1}, α_{2}}}), ℓ_{o i})$ and therefore cannot commute. We extend this result to sequences and have that $h (σ^{'}) \approx_{I_{D}}^{t} h (σ)$ iff $σ^{'} \approx_{I}^{t} σ$ .

This concludes our reduction. It remains to show that $C_{abs}$ is preemption-safe iff $L (A) \subseteq {Clo}_{I} (L (B))$ . By Proposition 1 it suffices to show that $L (A) \subseteq {Clo}_{I} (L (B))$ iff $L (P_{abs}^{'}) \subseteq {Clo}_{I_{D}} (L ({NP}_{abs}))$ .

We assume that $L (A) \subseteq {Clo}_{I} (L (B))$ . Then, for every word $σ \in L (A)$ we have that $σ \in {Clo}_{I} (L (B))$ . By construction $h (σ) \in L (P_{abs}^{'})$ . It remains to show that $h (σ) \in {Clo}_{I_{D}} (L ({NP}_{abs}))$ . By $σ \in {Clo}_{I} (L (B))$ we know there exists a word $σ^{'} \in L (B)$ , such that $σ^{'} \approx_{I}^{t} σ$ . Therefore also $h (σ^{'}) \approx_{I_{D}}^{t} h (σ)$ and by construction $h (σ^{'}) \in L ({NP}_{abs})$ .
We assume that $L (A) ⊈ {Clo}_{I} (L (B))$ . Then, there exists a word $σ \in L (A)$ such that $σ \notin {Clo}_{I} (L (B))$ . By construction $h (σ) \in L (P_{abs}^{'})$ . Let us assume towards contradiction that $h (σ) \in {Clo}_{I_{D}} (L ({NP}_{abs}))$ . Then there exists a word $ω$ in $L ({NP}_{abs})$ such that $ω \approx_{I_{D}}^{t} h (σ)$ . By construction, this implies there exists some $σ^{'} \in L (B)$ such that $ω = h (σ^{'})$ and $h (σ^{'}) \approx_{I_{D}}^{t} h (σ)$ . Thus, there exists $σ^{'} \in L (B)$ such that $σ^{'} \approx_{I}^{t} σ$ . This implies $σ \in {Clo}_{I} (L (B))$ , which is a contradiction. $□$

Fortunately, a bounded version of the language inclusion modulo I problem is decidable. Recall the relation $\approx_{I}$ over $Σ^{*}$ from Sect. 6.1. We define a symmetric binary relation $\approx_{I}^{i}$ over $Σ^{*}$ : $(σ, σ^{'}) \in \approx_{I}^{i}$ iff $\exists (α, β) \in I$ : $(σ, σ^{'}) \in \approx_{I}, σ [i] = σ^{'} [i + 1] = α$ and $σ [i + 1] = σ^{'} [i] = β$ . Thus $\approx_{I}^{i}$ consists of all words that can be obtained from each other by commuting the symbols at positions i and $i + 1$ . We next define a symmetric binary relation $≍$ over $Σ^{*}$ : $(σ, σ^{'}) \in ≍$ iff $\exists σ_{1}, \dots, σ_{t}$ : $(σ, σ_{1}) \in \approx_{I}^{i_{1}}, \dots, (σ_{t}, σ^{'}) \in \approx_{I}^{i_{t + 1}}$ and $i_{1} < \dots < i_{t + 1}$ . The relation $≍$ intuitively consists of words obtained from each other by making a single forward pass commuting multiple pairs of adjacent symbols. We recursively define $≍^{k}$ as follows: $≍^{0}$ is the identity relation id. For $k > 0$ we define $≍^{k} = ≍ \circ ≍^{k - 1}$ , the composition of $≍$ with $≍^{k - 1}$ . Given a language $L$ over $Σ$ , we use ${Clo}_{k, I} (L)$ to denote the set ${σ \in Σ^{*} : \exists σ^{'} \in L with (σ, σ^{'}) \in ≍^{k}}$ . In other words, ${Clo}_{k, I} (L)$ consists of all words which can be generated from $L$ using a finite-state transducer that remembers at most k symbols of its input words in its states. By definition we have ${Clo}_{0, I} (L) = L$ .

Example 1

We assume the language $L = {a, b}^{*}$ , where $(a, b) \in I$ .

$aaab ≍_{I}^{1} aaba$ because one can swap the letters as position 3 and 4.
$aaab ≭_{I}^{1} abaa$ because one can only swap the letters as position 3 and 4 in one pass, but not after that swap 2 and 3.
However, $aaab ≍_{I}^{2} abaa$ , as two passes suffice to do the two swaps.
$baaa ≍_{I}^{1} aaba$ because in a single pass one can swap 1 and 2 and then 2 and 3.

Definition 2

(Bounded language inclusion modulo an independence relation) Given NFAs $A, B$ over $Σ, I \subseteq Σ \times Σ$ and a constant $k \geq 0$ , the k-bounded language inclusion problem modulo I is: $L (A) \subseteq {Clo}_{k, I} (L (B))$ ?

Theorem 4

For NFAs $A, B$ over $Σ, I \subseteq Σ \times Σ$ and a constant $k \geq 0, L (A) \subseteq {Clo}_{k, I} (L (B))$ is decidable.

We present an algorithm to check k-bounded language inclusion modulo I, based on the antichain algorithm for standard language inclusion [11].

Antichain algorithm for language inclusion

Given a partial order $(X, ⊑)$ , an antichain over X is a set of elements of X that are incomparable w.r.t. $⊑$ . In order to check $L (A) \subseteq L (B)$ for NFAs $A = (Q_{A}, Σ, Δ_{A}, Q_{ι, A}, F_{A})$ and $B = (Q_{B}, Σ, Δ_{B}, Q_{ι, B}, F_{B})$ , the antichain algorithm proceeds by exploring $A$ and $B$ in lockstep. Without loss of generality we assume that $A$ and $B$ do not have $ϵ$ -transitions. While $A$ is explored nondeterministically, $B$ is determinized on the fly for exploration. The algorithm maintains an antichain, consisting of tuples of the form $(s_{A}, S_{B})$ , where $s_{A} \in Q_{A}$ and $S_{B} \subseteq Q_{B}$ . The ordering relation $⊑$ is given by $(s_{A}, S_{B}) ⊑ (s_{A}^{'}, S_{B}^{'})$ iff $s_{A} = s_{A}^{'}$ and $S_{B} \subseteq S_{B}^{'}$ . The algorithm also maintains a frontier set of tuples yet to be explored.

Given state $s_{A} \in Q_{A}$ and a symbol $α \in Σ$ , let ${succ}_{α} (s_{A})$ denote ${s_{A}^{'} \in Q_{A} : (s_{A}, α, s_{A}^{'}) \in Δ_{A}}$ . Given set of states $S_{B} \subseteq Q_{B}$ , let ${succ}_{α} (S_{B})$ denote ${s_{B}^{'} \in Q_{B} : \exists s_{B} \in S_{B} : (s_{B}, α, s_{B}^{'}) \in Δ_{B}}$ . Given tuple $(s_{A}, S_{B})$ in the frontier set, let ${succ}_{α} (s_{A}, S_{B})$ denote ${(s_{A}^{'}, S_{B}^{'}) : s_{A}^{'} \in {succ}_{α} (s_{A}), S_{B}^{'} = {succ}_{α} (S_{B})}$ .

In each step, the antichain algorithm explores $A$ and $B$ by computing $α$ -successors of all tuples in its current frontier set for all possible symbols $α \in Σ$ . Whenever a tuple $(s_{A}, S_{B})$ is found with $s_{A} \in F_{A}$ and $S_{B} \cap F_{B} = \emptyset$ , the algorithm reports a counterexample to language inclusion. Otherwise, the algorithm updates its frontier set and antichain to include the newly computed successors using the two rules enumerated below. Given a newly computed successor tuple $p^{'}$ , if there does not exist a tuple p in the antichain with $p ⊑ p^{'}$ , then $p^{'}$ is added to the frontier set or antichain (Rule R1). If $p^{'}$ is added and there exist tuples $p_{1}, \dots, p_{n}$ in the antichain with $p^{'} ⊑ p_{1}, \dots, p_{n}$ , then $p_{1}, \dots, p_{n}$ are removed from the antichain (Rule R2). The algorithm terminates by either reporting a counterexample, or by declaring success when the frontier becomes empty.

Antichain algorithm for k-bounded language inclusion modulo I

This algorithm is essentially the same as the standard antichain algorithm, with the automaton $B$ above replaced by an automaton $B_{k, I}$ accepting ${Clo}_{k, I} (L (B))$ . The set $Q_{B_{k, I}}$ of states of $B_{k, I}$ consists of triples $(s_{B}, η_{1}, η_{2})$ , where $s_{B} \in Q_{B}$ and $η_{1}, η_{2}$ are words over $Σ$ of up to k length. Intuitively, the words $η_{1}$ and $η_{2}$ store symbols that are expected to be matched later along a run. The word $η_{1}$ contains a list of symbols for transitions taken by $B_{k, I}$ , but not yet matched in $B$ , whereas $η_{2}$ contains a list of symbols for transitions taken in $B$ , but not yet matched in $B_{k, I}$ . We use $\emptyset$ to denote the empty list. Since for every transition of $B_{k, I}$ , the automaton $B$ will perform one transition, we have $| η_{1} | = | η_{2} |$ . The set of initial states of $B_{k, I}$ is ${(s_{B}, \emptyset, \emptyset) : s_{B} \in Q_{ι, B}}$ . The set of final states of $B_{k, I}$ is ${(s_{B}, \emptyset, \emptyset) : s_{B} \in F_{B}}$ . The transition relation $Δ_{B_{k, I}}$ is constructed by repeatedly performing the following steps, in order, for each state $(s_{B}, η_{1}, η_{2})$ and each symbol $α$ . In what follows, $η [\ i]$ denotes the word obtained from $η$ by removing its ith symbol.

Given $(s_{B}, η_{1}, η_{2})$ and $α$

Step S1 Pick new $s_{B}^{'}$ and $β \in Σ$ such that $(s_{B}, β, s_{B}^{'}) \in Δ_{B}$
Step S2
1. If $\forall i$ : $η_{1} [i] \neq α$ and $α$ is independent of all symbols in $η_{1}$ , $η_{2}^{'} : = η_{2} \cdot α$ and $η_{1}^{'} : = η_{1}$ ,
2. else, if $\exists i$ : $η_{1} [i] = α$ and $α$ is independent of all symbols in $η_{1}$ prior to $i, η_{1}^{'} : = η_{1} [\ i]$ and $η_{2}^{'} : = η_{2}$
3. else, go to S1
Step S3
1. If $\forall i$ : $η_{2}^{'} [i] \neq β$ and $β$ is independent of all symbols in $η_{2}^{'}, η_{1}^{''} : = η_{1}^{'} \cdot β$ and $η_{2}^{''} : = η_{2}^{'}$ ,
2. else, if $\exists i$ : $η_{2}^{'} [i] = β$ and $β$ is independent of all symbols in $η_{2}^{'}$ prior to $i, η_{2}^{'} : = η_{2}^{'} [\ i]$ and $η_{1}^{''} : = η_{1}^{'}$
3. else, go to S1
Step S4 Add $((s_{B}, η_{1}, η_{2}), α, (s_{B}^{'}, η_{1}^{''}, η_{2}^{''}))$ to $Δ_{B_{k, I}}$ and go to 1.

Example 2

In Fig. 15, we have an NFA $B$ with $L (B) = {α β, β}, I = {(α, β)}$ and $k = 1$ . The states of $B_{k, I}$ are triples $(q, η_{1}, η_{2})$ , where $q \in Q_{B}$ and $η_{1}, η_{2} \in {α, β}^{*}$ . We explain the derivation of a couple of transitions of $B_{k, I}$ . The transition shown in bold from $(q_{0}, \emptyset, \emptyset)$ on symbol $β$ is obtained by applying the following steps once: S1. Pick $q_{1}$ following the transition $(q_{0}, α, q_{1}) \in Δ_{B}$ . S2(a). $η_{2}^{'} : = β, η_{1}^{'} : = \emptyset$ . S3(a). $η_{1}^{''} : = α, η_{2}^{''} : = β$ . S4. Add $((q_{0}, \emptyset, \emptyset), β, (q_{1}, α, β))$ to $Δ_{B_{k, I}}$ . The transition shown in bold from $(q_{1}, α, β)$ on symbol $α$ is obtained as follows: S1. Pick $q_{2}$ following the transition $(q_{1}, β, q_{2}) \in Δ_{B}$ . S2(b). $η_{1}^{'} : = \emptyset, η_{2}^{'} : = β$ . S3(b). $η_{2}^{''} : = \emptyset, η_{1}^{''} : = \emptyset$ . S4. Add $((q_{1}, α, β), β, (q_{2}, \emptyset, \emptyset))$ to $Δ_{B_{k, I}}$ . It can be seen that $B_{k, I}$ accepts the language ${α β, β α, β} = {Clo}_{k, I} (L (B))$ .

Proposition 2

Given $k \geq 0$ , the automaton $B_{k, I}$ accepts at least ${Clo}_{k, I} (L (B))$ .

Proof

The proof is by induction on k. The base case is trivially true, as $L (B_{0, I}) = L (B) = {Clo}_{0, I} (L (B))$ . The induction case assumes that $B_{k, I}$ accepts at least ${Clo}_{k, I} (L (B))$ and we want to show that $B_{k + 1, I}$ accepts at least ${Clo}_{k + 1, I} (L (B))$ . We take a word $ω \in {Clo}_{k + 1, I} (L (B))$ . It must be derived from a word $ω^{'} \in {Clo}_{k, I} (L (B))$ by one additional forward pass of swapping. $B_{k + 1, I}$ accepts $ω$ : In step S1 we pick the same transitions in $Δ_{B}$ as to accept $ω^{'}$ . Steps S2 and S3 will be identical as for $ω^{'}$ with the exception of those adjacent symbol pairs that are newly swapped in $ω$ . For those pairs the symbols are first added to $η_{2}$ and $η_{1}$ by S2 and S3. In the next step they are removed because the swapping only allows adjacent symbols to be swapped. This also shows that the bound $k + 1$ suffices to accept $ω$ . $□$

In general NFA $B_{k, I}$ can accept words not in ${Clo}_{k, I} (L (B))$ . Intuitively this is because $B_{k, I}$ has two stacks and can also accept words where the swapping is done in a backward pass (instead of a forward pass required in our definition). For our purposes it is sound to accept more words as long as they are obtained only by swapping independent symbols.

Proposition 3

Given $k \geq 0$ , the automaton $B_{k, I}$ accepts at most ${Clo}_{I} (L (B))$ .

Proof

We need to show that $ω^{'} \in B_{k, I} \Rightarrow ω^{'} \in {Clo}_{I} (L (B))$ . For this we need to show that $ω^{'}$ is a permutation of a word $ω \in L (B)$ by repeatedly swapping independent, adjacent symbols. The word $ω^{'}$ must be a permutation of $ω$ because $B_{k, I}$ only accepts if $η_{1}$ and $η_{2}$ are empty and the stacks represent exactly the symbols not matched yet in NFA $B$ . Further, we need to show only independent symbols may be swapped. The stack $η_{1}$ contains the symbols not yet matched by $B$ and $η_{2}$ the symbols that were instead accepted by $B$ , but not yet presented as input to $B_{k, I}$ . Before adding a new symbol to the stack we ensure it is independent with all symbols on the other stack because once matched later it will have to come after all of these. When a symbols is removed it is ensured that it is independent with all symbols on its own stack because it is practically moved ahead of the other symbols on the stack. $□$

Language inclusion check algorithm

We develop a procedure to check language inclusion modulo I (Sect. 6.4) by iteratively increasing the bound k. The procedure is incremental: the check for $k + 1$ -bounded language inclusion modulo I only explores paths along which the bound k was exceeded in the previous iteration.

The algorithm for k-bounded language inclusion modulo I is presented as function Inclusion in Algorithm 1 (ignore Lines 22–25 for now). The antichain set consists of tuples of the form $(s_{A}, S_{B_{k, I}})$ , where $s_{A} \in Q_{A}$ and $S_{B_{k, I}} \subseteq Q_{B} \times Σ^{k} \times Σ^{k}$ . The frontier consists of tuples of the form $(s_{A}, S_{B_{k, I}}, cex)$ , where $cex \in Σ^{*}$ . The word $cex$ is a sequence of symbols of transitions explored in $A$ to get to state $s_{A}$ . If the language inclusion check fails, $cex$ is returned as a counterexample to language inclusion modulo I. Each tuple in the frontier set is first checked for equivalence w.r.t. acceptance (Line 18). If this check fails, the function reports language inclusion failure and returns the counterexample $cex$ (Line 18). If this check succeeds, the successors are computed (Line 20). If a successor satisfies rule R1, it is ignored (Line 21), otherwise it is added to the frontier (Line 26) and the antichain (Line 27). When adding a successor to the frontier the symbol $α$ it appended to the counterexample, denoted as $cex \cdot α$ . During the update of the antichain the algorithm ensures that its invariant is preserved according to rule R2. graphic file with name 10703_2016_256_Figa_HTML.jpg

We need to ensure that our language inclusion honors the bound k by ignoring states that exceed the bound. These states are stored for later to allow for a restart of the language inclusion algorithm with a higher bound. Given a newly computed successor $(s_{A}^{'}, S_{B_{k, I}}^{'})$ for an iteration with bound k, if there exists some $(s_{B}, η_{1}, η_{2})$ in $S_{B_{k, I}}^{'}$ such that the length of $η_{1}$ or $η_{2}$ exceeds k (Line 22), we remember the tuple $(s_{A}^{'}, S_{B_{k, I}}^{'})$ in the set $o v e r f l o w$ (Line 23). We then prune $S_{B_{k, I}}^{'}$ by removing all states $(s_{B}, η_{1}, η_{2})$ where $| η_{1} | > k \lor | η_{2} | > k$ (line 24) and mark $S_{B_{k, I}}^{'}$ as dirty (line 24). If we find a counterexample to language inclusion we return it and test if it is spurious (Line 8). In case it is spurious we increase the bound to $k + 1$ , remove all dirty items from the antichain and frontier (lines 10–11), and add the items from the overflow set (Line 12) to the antichain set and frontier. Intuitively this will undo all exploration from the point(s) the bound was exceeded and restarts from that/those point(s).

We call a counterexample $cex$ from our language inclusion procedure spurious if it is not a counterexample to the unbounded language inclusion, formally $cex \in {Clo}_{I} (L (B))$ . This test is decidable because there is only a finite number of permutations of $cex$ . This spuriousness arises from the fact that the bounded language-inclusion algorithm is incomplete and every spurious example can be eliminated by sufficiently increasing the bound k. Note, however, that there exists automata and independence relations for which there is a (different) spurious counterexample for every k. In practice we test if a $cex$ is spurious by building an automata $A$ that accepts exactly $cex$ and running the language inclusion algorithm with k being the length of $cex$ . This is very fast because there is exactly one path through $A$ .

Theorem 5

(bounded language inclusion check) The procedure inclusion of Algorithm 1 decides $L (A) \subseteq L (B_{k, I})$ for NFAs $A, B$ , bound k, and independence relation I.

Proof

Our algorithm takes as arguments automata $A$ and $B$ . Conceptually, the algorithm constructs $B_{k, I}$ and uses the antichain algorithm [11] to decide the language inclusion. For efficiency, we modify the original antichain language inclusion algorithm to construct the automaton $B_{I}$ on the fly in the successor relation $succ$ (line 20). The bound k is enforced separately in line 22. $□$

Theorem 6

(preemption-safety problem) If program $C$ is not preemption-safe ( Inline graphic ), then Algorithm 1 will return $false$ .

Proof

By Theorem 1 we know Inline graphic . From Proposition 1 we get $L (P_{a b s}) ⊈ {Clo}_{I_{D}} (L ({NP}_{abs}))$ . From Proposition 3 we know that for any k this is equivalent to $L (P_{a b s}) ⊈ L (B_{k, I})$ , where $B = {NP}_{abs}$ . Theorem 5 shows that Algorithm 1 decides this for any bound k. $□$

Finding and Enforcing Mutex Constraints in $P_{abs}^{'}$

If the language inclusion check fails it returns a counterexample trace. Using this counterexample we derive a set of mutual exclusion (mutex) constraints that we enforce in $P_{abs}^{'}$ to eliminate the counterexample and then rerun the language inclusion check with the new $P_{abs}^{'}$ .

Finding mutex constraints

The counterexample $cex$ returned by the language inclusion check is a sequence of observables. Since our observables record every branching decision it is easy to reconstruct from $cex$ a sequence of event identifiers: $t i d_{0} . ℓ_{0} ; \dots ; t i d_{n} . ℓ_{n}$ , where each $ℓ_{i}$ is a location identifier from $C_{abs}$ . In this section we use $cex$ to refer to such sequences of event identifiers. We define the neighborhood of $cex$ , denoted $nhood (cex)$ , as the set of all traces that are permutations of the events in $cex$ and preserve the order of events from the same thread. We separate traces in $nhood (cex)$ into good and bad traces. Good traces are all traces that are infeasible under the non-preemptive semantics or that produce an observation sequence that is equivalent to that of a trace feasible under the non-preemptive semantics. All remaining traces in $nhood (cex)$ are bad. The goal of our counterexample analysis is to characterize all bad traces in $nhood (cex)$ in order to enable inference of mutex constraints.

In order to succinctly represent subsets of $nhood (cex)$ , we use ordering constraints between events expressed as happens-before formulas (HB-formulas) [15]. Intuitively, ordering constraints are of the following forms: (a) atomic ordering constraints $φ = A < B$ where A and B are events from $cex$ . The constraint $A < B$ represents the set of traces in $nhood (cex)$ where event A is scheduled before event B; (b) Boolean combinations of atomic constraints $φ_{1} \land φ_{2}, φ_{1} \lor φ_{2}$ and $\neg φ_{1}$ . We have that $φ_{1} \land φ_{2}$ and $φ_{1} \lor φ_{2}$ respectively represent the intersection and union of the set of traces represented by $φ_{1}$ and $φ_{2}$ , and that $\neg φ_{1}$ represents the complement (with respect to $nhood (cex)$ ) of the traces represented by $φ_{1}$ .

Non-preemptive neighborhood

First, we define function $Φ$ to extract a conjunction of atomic ordering constraints from a trace $π$ , such that all traces $π^{'}$ in $Φ (π)$ produce an observation sequence equivalent to $π$ . Then, we obtain a correctness constraint $φ$ that represents all good traces in $nhood (cex)$ . Remember, that the good traces are those that are observationally equivalent to a non-preemptive trace. The correctness constraint $φ$ is a disjunction over the ordering constraints from all traces in $nhood (cex)$ that are feasible under non-preemptive semantics: $φ_{G} = ⋁_{π \in non-preemptive} Φ (π)$ .

$Φ (π)$ enforces the order between conflicting accesses in the abstract trace $π$ :

\begin{matrix} Φ (π) = ⋀ {T i . ℓ_{j} < & T k . ℓ_{l} : i \neq k \land T i . ℓ_{j} precedes T k . ℓ_{l} in π \land \\ T i . ℓ_{j}, T k . ℓ_{l} access same variable \land T i . ℓ_{j} or T k . ℓ_{l} is a write} \end{matrix}