Formal verification: will the seedling ever flower?

Neil White; Stuart Matthews; Roderick Chapman

doi:10.1098/rsta.2015.0402

. 2017 Sep 4;375(2104):20150402. doi: 10.1098/rsta.2015.0402

Formal verification: will the seedling ever flower?

Neil White ^1,^✉, Stuart Matthews ¹, Roderick Chapman ¹

PMCID: PMC5597725 PMID: 28871051

Abstract

In one sense, formal specification and verification have been highly successful: techniques have been developed in pioneering academic research, transferred to software companies through training and partnerships, and successfully deployed in systems with national significance. Altran UK has been in the vanguard of this movement. This paper summarizes some of our key deployments of formal techniques over the past 20 years, including both security- and safety-critical systems. The impact of formal techniques, however, remains within an industrial niche, and while government and suppliers across industry search for solutions to the problems of poor-quality software, the wider software industry remains resistant to adoption of this proven solution. We conclude by reflecting on some of the challenges we face as a community in ensuring that formal techniques achieve their true potential impact on society.

This article is part of the themed issue ‘Verified trustworthy software systems’.

Keywords: formal methods, software verification, proof, SPARK

1. Introduction and overview

Altran UK's Intelligent Systems Expertise Centre (formally Altran Praxis) has a 30-year track record of developing and deploying high-integrity and safety-critical systems using formal methods. In some cases, we have gone as far as delivering a warranty with our software with respect to a formal specification. Our approach is underpinned by principles and experiences with particular notations and tools that have been successfully deployed on a number of industrial projects. A key component of this approach has been the development of long-term relationships with leading academic institutions, both within the UK and worldwide, in order to bring new research and tools into industrial practice. Despite these success stories, the use of formal methods still defies wider adoption.

The remainder of the paper is structured as follows. In §2, we consider the principles that guide our approach and give examples of the tools that we have found most useful. Section 3 illustrates these ideas with examples from five projects. Section 4 goes on to discuss the questions and barriers that seem to be limiting adoption. Section 5 concludes with thoughts for the future and how we might advocate for broader adoption of formal approaches to software engineering.

2. Altran UK's principles and tools

Over the years, we have adopted and used several formal notations and tools, as needed and appropriate for each project. This section opens with the principles that we have distilled from this experience. We then go on to consider the ‘toolbox’ that we currently use and take a deeper dive into one notable example: the SPARK¹ programming language and verification tools.

(a). Principles

It has long been known that the cost of correcting defects increases as the gap widens between their introduction and detection. A defect introduced during the requirements phase is comparatively cheap to fix if also found during the requirements phase. It becomes more expensive to fix during development, even more expensive during verification, and so on; the worst scenario being the need to fix the defect after the system is delivered to the client and in operation. The same is true for defects introduced during the architecture or development phases. Table 1, adapted from [1], illustrates the relative costs to fix defects introduced in the requirements phase. Similar tables can be produced for defects introduced in later phases.

Table 1.

Relative cost to fix a defect introduced in the requirements phase.

stage defect found	relative cost to fix
requirements	1 (definition)
architecture	3
construction	5–10
system test	10
post-release	10–100

Open in a new tab

Every time a defect is found in industrial software, the decision to fix is based on the business case of impact versus cost-to-fix. The later the defect is found, and consequently the more it costs to fix, the less likely the business case will be justified, and the more likely the defect is allowed to remain—thereby undermining quality. For truly critical software, however, the defect impact can be so large as to make a fix effectively mandatory, regardless of cost.

These facts lead us to the guiding principles of our approach:

— do everything practical to prevent the introduction of defects; but
— accept defects will be introduced, so do everything practical to identify and remove defects as close as possible to their point of introduction. This leads to a natural preference for static analysis of design artefacts over dynamic testing, because static techniques can be introduced far earlier in the life cycle.

These guiding principles underpin our strategy:

Take small steps, not large leaps. We try to limit the ‘semantic gap’ between notations. We do not try to jump from natural-language requirements straight to ‘code’. Rather, we might use several notations, increasing in formality before code is finally reached. Doing so eases verification and reduces defect density earlier in the life cycle.
Use precise, or even better formal, notations for each step. We prefer to use automatic tools to perform verification steps. These work much better if the notations being processed are truly unambiguous and amenable to formal verification, refinement or synthesis.
Verify every artefact back to its predecessor before embarking on the next step forward. If ‘small steps’ verification is possible, then do the verification now, before embarking on the next step. The verification activity itself might be a peer review, automated analysis, a proof or a formal refinement depending on the formality of the notations being compared, the degree of rigour required and the availability of suitable tools.
Use tool-supported methods for the verification. Tools are very good at some particular problems, do not get tired or have a bad day, and are highly scalable. With formal notations, tools can be sound for verification of key properties. Secondly, we find that tools are very good at solving some classes of problems where humans struggle intellectually, and vice versa. We, therefore, use tools to complement, but not replace, the role of people in verification.
Design the software to simplify verification. Over the years, we have developed verifiable patterns for architecture, designs and code. This includes simplification of the programming language as a whole to remove features that defy verification. We generalize the notion of ‘Test-Driven Design’ to ‘Verification-Driven Design’, which respects all the verification activities that we need to perform to meet our objectives.
Say things only once. We avoid repetition of information in design artefacts. Each pertinent piece of information should be recorded and configured exactly once. Other copies can be generated as part of an automated build process. Repetition creates extra work, and is highly prone to introducing inconsistency.
Do the hard or risky things first. Risks are identified early in a project (e.g. ‘we've never done X before, and we don't know how to do it’) and should be attacked, not swept under the carpet. For example, one project had a particularly challenging need for concurrency, so we modelled the proposed software design in CSP [2] and used the FDR [3] model-checker to verify key properties of the design. Many mistakes were discovered and corrected. This was all done long before coding was attempted.

The elements of this strategy are interlinked. By taking small steps, each step-by-step verification activity is correspondingly small and manageable. By using precise notations, each verification step can be trustworthy and amenable to automation, and there are no gaps for defects to hide in. By doing tool-supported verification, we reduce the element of human error. By addressing risks first, we eliminate the most error-prone areas first. Testing is still important, but it becomes a demonstration of correctness, not a mechanism for finding defects. In fact, defects found in testing are indicative of a failure of an earlier verification step, and hence imply a need to fix the process as well as the system.

Our confidence in this strategy, underpinned by years of deploying it and monitoring the resulting data, allows us to offer a warranty on our software. Small print applies, but basically we fix defects for free. We call our strategy ‘Correctness-by-Construction’ [4].

Deployment of Correctness-by-Construction on specific projects requires the support of a range of tools and techniques, which we describe below.

(b). Toolbox

Every engineer has a toolbox. The role of a toolbox is to hold a set of tools, in good condition, ready for use. Not every tool is needed or useful for every job. Additionally, a job is easiest if the right tool is to hand, and the operator is trained to use it correctly.

The job of software development is no exception. Software engineers need a toolbox with tools, and also methods of operation. Each project requires the right set of tools to be selected. But we may not use every tool on every project. Training is needed—the incorrect use of a tool or method is just as dangerous in software as in any other industry.

In industry, where we want repeatable delivery of best practice, use of the toolbox needs to be habitualized. Individual engineers deliver a competency in one or more tools. Collectively, the engineers deliver the corporate capability.

Our selection criteria for tools include the following:

— Formality. We prefer tools that are underpinned by a formally defined, unambiguous language. We also prefer such languages to be an international, open standard rather than the IP of a single organization.
— Soundness. For verification tools, we prefer tools that can offer and support a rational case for the soundness of their verification. Such tools support cost saving by reducing defect density to zero for some classes of defect, and by optimization of later process steps (for example, entirely removing some test activities).
— Longevity. We require languages and tools that will be supported for the full lifetime of the project, including operation and decommissioning. An extreme example is the nuclear industry, where a 50-year lifespan is commonplace.
— Automation. We prefer tools that allow computer-based automation of repetitive, expensive tasks, or tasks where humans are intellectually weak.

In addition to the usual software engineering tools such as configuration management, IDEs, debuggers, compilers and so on, our corporate toolbox includes:

— Requirements engineering methods such as REVEAL, which is based on Jackson's Problem Frames approach [5].
— Specification languages and their associated tools. Different languages excel at different problems, such as state-rich functional specification (Z [6]), concurrency (CSP [2]), closed-loop control systems modelling (SCADE Suite [7] and Matlab/Simulink [8]) and general system modelling (the UML [9]).
— INFORMED—our verification-driven software design approach [10].
— Programming languages, subsets, and their verification tools such as SPARK [11,12] and MISRA C [13].
— Refinement and code-synthesis tools such as SCADE and AdaCore's QGen [14].
— Code-level verification tools such as SPARK and CodePeer [15].
— ConTestor—a dynamic testing approach developed at Altran based on constrained random test data generation.
— Delivery methods such as High-Integrity Agile [16].

Many of these tools originated as academic research projects which were later transferred into industry. A good example is SPARK, which we examine further in the following section.

(c). SPARK

SPARK [11,12] is a programming language and static verification technology designed specifically for the development of high-integrity software. Based on Ada, but independent of any specific implementation or compiler, SPARK is formally defined and aims to remove all language ambiguities and insecurities. This aim is achieved by the dual-track approach of:

— eliminating language constructs that are not amendable to sound analysis (e.g. uncontrolled concurrency or direct memory manipulation) and
— introducing contracts to capture the program requirements and to enable efficient, modular and compositional program verification.

The contracts can be proved before the program is ever run, but also checked dynamically by testing, and even left enabled in the deployed system for a ‘belt-and-braces’ approach.

First designed nearly 30 years ago, SPARK has built on academic input from a range of institutions, shown in figure 1. Successful examples of academic research influencing SPARK include:

— The work of Bergeretti and Carré on information-flow analysis of imperative programs [17].
— The work of the University of York Real-Time Systems Group [18] in the development of language subsets and analyses for real-time, concurrent programs and the so-called ‘Ravenscar Profile’ that is now part of the Ada standard, and is embodied in SPARK.
— Work at the University of Bath (and a subsequent Knowledge Transfer Partnership) [19] on answer-set programming, which led to our first commercially available counterexample finding tool.
— Paul Jackson at Edinburgh developed a tool that translated from SPARK's original VC language to the standard SMTLib format [20]. This enabled us to ship SMT-based theorem-proving tools for the first time.
— The work of the French government's research establishments and universities in the development of the Why3 infrastructure and the Alt-Ergo prover [21], both of which underpin the latest SPARK 2014 verification tools.
— New York University's development of the CVC4 prover [22], which also ships with SPARK 2014.
— Recent work in Daniel Kroening's group at Oxford [23] on decision procedures for floating-point arithmetic.

The current SPARK 2014 tools deploy a number of technologies. The main verification tool is called GNATprove. Its front end is based on the GNAT Pro Ada compiler infrastructure, which is, in turn, part of the GCC family. The middle end enforces the language subset and performs information-flow analysis, based on the construction and analysis of program dependence graphs [24]. The back end generates a representation of each function or procedure in the Why3ML language, which is then submitted to the Why3 Verification Condition (VC) generator, producing VCs in the SMTLib language. The toolset currently ships with the CVC4, Alt-Ergo and Z3 provers. CVC4 is also used to generate counterexamples for unproven VCs.

SPARK has established a track record of use in embedded and critical systems, across a diverse range of industrial domains, where safety and security are paramount. Later sections of this paper describe case studies of our approach in general, many of which employ SPARK.

(d). Future research

Altran continues to develop and improve tools and methods using a mix of in-house R&D and academic partnerships. Current areas of interest include:

— Making it easier to use formal methods at the requirements and/or specification stage, where natural languages still dominate, but remain ambiguous.
— Identifying state-of-the-art approaches to facilitating the production and maintenance of requirements with a formal underpinning. Maintenance of formal artefacts remains a critical, but largely unaddressed issue. In a large project, change management is a critical activity, where we have to estimate the size (and therefore price) of a proposed change. Impact analysis is also crucial if we are to determine when a proposed change undermines non-functional properties such as safety or security. We sorely need tools to help with this problem.
— Improving the workflow and efficiency of automated source code verification (proof). User feedback from proof tools is a continuous area of improvement. The recent development of counterexample finding in proof tools is a major breakthrough, for example. Improving the completeness (also known as ‘false alarm rate’) of automated proof tools is a never-ending battle.
— The sound combination of different analysis methods. If we have a functional specification in Z, and a concurrency design expressed in CSP, what can we conclude about their composition?
— Improved floating-point reasoning. The original SPARK tools could only reason about rational numbers, and so could produce unsound results for floating-point algorithms. The new SPARK 2014 tools support the recent formal semantics for floating-point that has been embodied in the SMTLib format [25]. The latest proof tools, such as CVC4, implement some bit-precise decision procedures [23] for floating-point verification conditions, but much more work is needed.
— Identifying opportunities to reduce manual effort during code development and verification.
— Effective testing of model-based code. The key here is to ‘join up’ the model-based notation with the code, so that both are formal and consistent.

3. Industrial projects

In this section, we will look at some of the key deployments of formal techniques over the past 20 years in both security- and safety-critical systems. For each case study, we summarize the context in which the work was undertaken and its objectives, the technical approach and solution (including the tools that were used), and the results, focusing on the scale of the verification effort and the properties that were checked.

(a). SHOLIS

SHOLIS is a system which monitors ship and helicopter landing parameters, such as wind vector plus ship's roll and pitch, and advises users if it is safe to proceed with a landing. Figure 2 shows SHOLIS' typical users—the Royal Navy's Type 23 frigate and Merlin helicopter.

Figure 2. — SHOLIS users: Type 23 Frigate HMS Sutherland and Merlin HM2. Crown Copyright © 2016. Reused under the Open Government Licence. (Online version in colour.)

(i). Context and objectives

The UK Ministry of Defence required the system certified to Def-Stan 00-55 SIL4. This in turn required full functional proof of the software against the formal specification. SHOLIS was the first software ever developed against this standard.

(ii). Approach and solution

We produced a formal specification in Z which was fully type-checked and proved for existence of preconditions and an initial state. The code was written in SPARK, with contracts produced directly from the Z. Safety-critical modules were proved to be compliant with the contracts. The entire code base was proved to be free from all ‘run-time errors’, such as buffer overflow, division by zero and so on, so that a defect in non-critical code could not prevent the safety critical functions from running. Similarly, the termination of every loop was carefully considered to ensure that a malfunctioning (i.e. infinite) loop could not monopolize the processor and prevent the SIL4 software from running. Finally, we performed static analysis of worst case execution time and stack usage to verify that real-time deadlines would be met and that there was no chance of stack overflow.

(iii). Results

The system was certified to Def-Stan 00-55. After 10 years of operational service, the 42 ksloc² has a defect density of 0.22 defects per ksloc. Unit test was performed in parallel with the formal proof, and was shown to have comparably low value—the quality of the code emerging from the SPARK development approach was such that traditional unit testing found very few defects. The full story and results were reported in 2000 [26].

(b). Global key centre

This system provided the root certificate authority for the MULTOS smartcard operating system. Essentially, it serves as the definitive (and very much unique) source of all cryptographic keys and certificates in the MULTOS public-key infrastructure.

(i). Context and objectives

This system was business-critical for the customer as flaws in the software could lead to very high financial impact and reduced end-customer confidence in the product. The ITSEC standard at assurance level E6 was to be applied as far as was practical.

(ii). Approach and solution

REVEAL requirements were captured in a specification written in Z. The code for security-critical functions—written in SPARK—had contracts for the security properties. This meant a residual defect could cause functional failure, but not a security breach. For this customer, this level of assurance was the ideal balance for their business case.

(iii). Results

This system [27] comprised 100 ksloc—mostly SPARK and Ada but also some C, C++ and SQL. Four trivial defects (one introduced in the specification, three introduced by the source code) were fixed under warranty in the first year of operation. The long-term defect rate is currently 0.04 defects per ksloc. The entire project consumed just over 3500 person-days of effort for a productivity of 28 sloc per day.

The use of formal methods also produced a commercial benefit for the project. The customer needs a rational basis to balance the cost, risk and business value of a change request or a defect report. The Z specification was agreed to be the single definitive statement of functionality for the system, and some of the customer's team were trained to read Z. This meant that the analysis of proposed changes and defects became centred on the Z specification. In such analysis, three questions dominate the discussion:

— Is it a defect (we pay for it), or a change (they pay for it)?
— How big, and therefore how expensive is it?
— What is the impact on safety and/or security?

The ability to answer such questions precisely, confidently and quickly became a critical factor in building trust with the customer, and in the eventual financial performance of this and subsequent projects.

(c). Engine monitoring unit

This system provides in-flight monitoring of jet engines to inform just-in-time maintenance activities.

(i). Context and objectives

The system can be installed in a range of engines and airframes, leading to a requirement to be able to work with a variety of different electrical and physical interfaces. The business driver was to enable cost-efficient engine maintenance and to this end the system had to provide information that supported accurate diagnosis. The software was developed within the regulatory framework of ED-12B/DO-178B [28].

(ii). Approach and solution

We deployed a Correctness-by-Construction approach that covered the systems engineering, the requirements for the engine unit and the software development. An INFORMED design was documented in UML, with SPARK code automatically generated from elements of the model. The SPARK code was proved to be free of all run-time exceptions.

(iii). Results

The system was successfully certified to DO-178B Level C. A family of engines is supported using common source code that is verified once but used often. Following on from this, a joint research project to develop the next generation of Engine Health Monitoring is now underway.

(d). Tokeneer biometric access control

The Tokeneer system implements a biometric-based access control system for a secure room—allowing entry only to people whose biometric signature (e.g. a fingerprint) matches a pre-authorized reference. The system then generates and issues a digital certificate that allows access to further computer systems, based on the user's role and access rights.

(i). Context and objectives

The US National Security Agency (NSA) leads the US government in cryptology. We were engaged to build a demonstrator using our Correctness-by-Construction approach to allow the NSA to understand how to build systems that are cost-effective, ultra-secure and certifiable to Common Criteria EAL5.

(ii). Approach and solution

Similar to the Global Key Centre, this system had a specification written in Z, and then security properties captured in SPARK contracts. The code was written in SPARK and security properties were proved. Testing was then done by an independent (NSA-chosen) third party.

(iii). Results

The 10 ksloc of SPARK code produced 2623 VCs, of which 2513 (95.8%) discharge automatically using the standard version of the SPARK Toolset [29]. The complete proof takes a matter of minutes on a standard desktop PC, meaning that the proof is maintained by developers as a matter of course and certainly before code is checked in to CM, compiled or tested. Zero functional defects were found by the independent third party system tester, despite having full access to the design and code. They did report two trivial mistakes in the user manual, though.

The entire project was delivered in 260 person-days, for a productivity of 38 sloc per day for the security-critical components. This performance is considerably better than industry norms at the time (2004), although these figures could be skewed by the small size (just three people) and expertise of the team.

This system (requirements, design, code, tests, build scripts and more) has been released under an open source licence for use by academics and researchers [30]. It has spawned a number of spin-off research projects. To date, only five defects have been located by teams across the globe deploying a number of academic and industrial tools [31].

(e). iFACTS

iFACTS is a set of tools to help UK air traffic controllers to handle increasing volumes of traffic safely. Provision includes trajectory prediction, medium-term conflict detection and flight monitoring. Figure 3 shows an iFACTS workstation at NATS' facility in Swanwick, UK.

Figure 3. — An iFACTS controller workstation. Copyright NATS 2016. Reused under the Creative Commons Licence from the NATS Media Toolkit. (Online version in colour.)

(i). Context and objectives

NATS had produced some prototype tools to enable controllers to safely handle the ever-increasing volume of air traffic over the UK. They needed these to be developed into an industrial system and certified to CAA SW01 standards. In practice, iFACTS was the first ever system to be developed under these new regulations.

(ii). Approach and solution

We produced a functional specification in Z and HMI behaviour was specified using finite state machines. The SPARK implementation was proved to be free of run-time exceptions. We also produced an HMI layer in MISRA C to match the look and feel of the existing tools. Incremental builds were delivered into test and operation using the High-Integrity Agile [16] model. As with the Global Key Centre, the use of Z became a critical asset in the management of changes, defects and costs.

(iii). Results

iFACTS has been in full operation (all controllers, all sectors, all day) since December 2011, recently passing one million hours of use. The system comprises 250 ksloc of SPARK. The SPARK tools generate 152 927 VCs from the source code, of which 98.76% discharge automatically in a matter of minutes, owing to the tools' ability to parallelize and exploit caching of proof results [32]. Re-proof of small changes can be completed by developers in seconds, achieving a ‘proof-first’ development style.

(f). Non-Altran SPARK projects

We have also helped other organizations adopt formal methods and, in particular, SPARK. Some of these remain unpublished owing to commercial concerns, but a brief summary appears in [32]. Broken down by industry sector, notable examples include:

— Commercial aerospace. Large gas turbine engine control systems and flight control systems.
— Military aerospace. Eurofighter Typhoon aircraft (all critical systems, including flight control, stores and fuel management), Lockheed-Martin C130 J Hercules (main mission computers).
— Rail. SIL4 signalling and interlocking systems.
— Security. Several formal reference implementations of cryptographic algorithms. Rockwell Collins SecureOne Cross Domain Guard product line.
— Operating systems. The Muen hypervisor/separation kernel.

4. Adoption barriers and opportunities

Based on the successful case studies presented, it is a valid question to ask why formal methods have not had wider acceptance and take-up in industrial software. This question is pertinent, given that we have substantial data to show that formal methods lead to both cheaper and higher-quality software products.

Some of the more common objections to adoption of formal methods, with our corresponding responses, are considered below.

Objection: ‘I don't want to be locked into a tool from a single vendor.’

Response: This concern represents a real issue. Tool support for long-running development programmes needs to be reliable and available. But this is not just an issue with formal methods tools—it applies to all tools on a project including compilers, editors and so on. In mitigation, on large projects tools can be frozen relatively early, as tool upgrades are costly in re-validation effort. Ideally, we favour formal notations that have an unambiguous semantics where several tool vendors can compete for our business either across a range of lifecycle activities (e.g. analysis tools, compilers, test tools) or throughout the lifetime of a project (e.g. a hardware upgrade forcing the need for a new compiler). SPARK has this property—one of our customers upgraded their target hardware from Transputer to PowerPC, and hence needed a completely new compiler. The SPARK code worked effortlessly with the new compiler, owing to its total lack of undefined and unspecified behaviour.

Objection: ‘I've bought <other tool> and it was very expensive, so I have to use it.’

Response: This is what happens when financial concerns on projects overrule engineering decisions. This also inhibits innovation, research, improvement and onward development. If this mindset exists for tools, it probably also exists elsewhere in the business. We believe that projects should be run against the right criteria, balancing costs and technology, to ensure the right delivery to the client.

Objection: ‘My team don't know <formal methods tool> so we can't use it.’

Response: If your team have a good grounding in basic computer science principles, then given the right training, they can pick up any tool quickly enough. The iFACTS programme recruited and trained over 60 engineers to be able to read Z and produce SPARK code. Willingness to take part in the training was a mandatory criterion when interviewing candidates. We also taught several domain experts from the customer (i.e. professional air traffic controllers) in order to review and validate the specification. Finally, we noted that the skills needed to design and write a formal notation like Z are very different from the skills needed to read Z and produce code. This Z writing team was much smaller than the development and test team, peaking at only 10 engineers.

Objection: ‘We want to use industry standards' or ‘We want to use industry practice.’

Response: You should use best practice, but this can be difficult in software engineering, where industry trends (e.g. for languages, tools and development methods) all seem to change annually. We try to focus on what we know works from a mathematical and technical standpoint, not on ‘what's hot’ in the wider world of software development. Most of our products have unusually long lifespans, so we try to pick languages and technologies that will stand the test of time. Formal languages are good in this regard, because they have a consistent semantics across the decades.

Objection: ‘We don't like to spend more upfront.’

Response: Generally, the cost profile of a formal methods project has more spend before code starts to be written; however, all the data show that the overall spend is lower. We do encounter many projects that suffer from a short-term view of planning and decision-making. If a project manager has a deadline or a milestone in 6 months, they are hardly likely to choose a new technology that will require a non-trivial investment of capital and time up front. In these projects, ‘do the same as last time, but promise to be a bit more careful’ becomes the easy choice. Some companies also implement per-project accounting, where each project must pay for its own investment in tools and training. Again, this is detrimental, since no project wants to risk its own ‘pot of money’ on something new. Improvement needs to be funded and supported at a capital level.

Objection: ‘I want a drag-and-drop graphical interface.’

Response: Of course, we need better tools and a better and more productive experience for users, but a pretty graphical user interface (GUI) should not be mistaken for the underlying capability of the tools. An unsound verification tool with a good-looking GUI is still unsound.

We can conclude that there are logical flaws in industry's rejection of formal methods. While there is strong technical evidence pointing to the efficacy of formal approaches, it seems that non-technical economic and social incentives still dominate decision-making in large organizations.

There are a number of potential elements to forming a strategic response. As formalists, we need to broaden our case beyond the mere technical arguments to address the concerns of company directors. We need to communicate using terms that cause such ‘C-level executives’ to lose sleep—for example risk, cost, personal liability and corporate governance. Additionally, we need to appeal to the legal profession, insurers, regulators, standards-setting bodies and the general public to raise their expectations of quality and fitness-for-purpose in critical software systems. For example, we could fight for tougher standards that raise the bar in terms of requiring or rewarding the use of formal methods.

In reality, a strategy will have to be composed of all of these campaigning elements and more, including funded research producing high-impact results. However, at Altran we believe that there is a further, more subtle element that may be needed for successful take-up of formal methods. The essence of this approach is to hide the formality from the users—to use mathematics to provide the underpinning for sound tools and techniques while at the same time lowering the adoption hurdle.

One example of this approach is a new test solution, called ConTestor, which we are deploying as part of our verification toolbox. Many teams automate the running of tests, but ConTestor automates also the initial production of test cases. Users do not have to know how the underlying technology works—it is a hidden formal method.

5. Conclusion

Our experience and evidence show that university research in formal methods can be deployed with great success in industrial projects. However, gaining broad industry acceptance for the use of formal methods is a much harder challenge given the complex social and economic incentives that pervade the industry.

As formal methods advocates, we need to:

— Stay logical: we need independent up-to-date papers comparing formal and non-formal approaches so that we can make logical decisions, based on data, not opinion.
— Fight the illogical: we need to bring formal methods to the attention of industry and the public in new ways.

And above all, we need to introduce formal methods into the software life cycle in more creative and inventive ways that continue to deliver the clear benefits that they provide, while being easier to adopt and reducing the burden for users.

Acknowledgements

The authors thank all members of Altran UK staff, past and present, who have contributed to our projects over the years, and thus created this important body of knowledge. Thanks also to our clients who have permitted publication of data relating to their projects. Finally, we would like to thank the journal's referees and editors for their valuable comments on the first draft of this paper.

Footnotes

The SPARK programming language is in no way connected with the Apache SPARK cluster computing system.

Counting lines of code is notoriously inconsistent. Throughout this paper we use ksloc as ‘thousand lines of logical source code’, as reported by a single tool with consistent options. This means that code counts within this paper are directly comparable, but of course care must be taken in comparisons with any data (e.g. in other papers) that use a different counting method. The largest variation is between logical and physical lines of code.

Data accessibility

The complete Tokeneer project archive is publically available to support teaching and research [19]. Data from the other projects mentioned in §3 are not available at this time, owing to our clients' policies regarding confidentiality and intellectual property.

Authors' contributions

All the authors have a long-term (multi-year) relationship with the case studies, methods and tools described. N.W. initially drafted the paper, which was reviewed and revised by the other authors.

Competing interests

We declare we have no competing interests.

Funding

All the authors were employed by Altran UK at the time of production.

References

1.McConnell S. 2004. Code complete, 2nd edn Redmond, WA: Microsoft Press. [Google Scholar]
2.Hoare CAR. 1985. Communicating sequential processes. Englewood Cliffs, NJ: Prentice Hall. [Google Scholar]
3.FDR. 2016 FDR4—the CSP Refinement Checker. See http://www.cs.ox.ac.uk/projects/fdr/.
4.Croxford M, Chapman R. 2005. Correctness by construction: a manifesto for high-integrity software. Crosstalk J. Def. Softw. Eng. 18, 5–8. [Google Scholar]
5.Jackson M. 2001. Problem frames: analyzing and structuring software development problems. Reading, MA: Addison Wesley. [Google Scholar]
6.Woodcock J, Davies J. 1996. Using Z: specification, refinement and proof. Englewood Cliffs, NJ: Prentice Hall. [Google Scholar]
7.Ansys. 2017 SCADE Suite home page. See http://www.ansys.com/products/embedded-software/ansys-scade-suite.
8.MathWorks. 2017 Simulink home page. See https://uk.mathworks.com/products/simulink.html.
9.OMG. 2017 UML home page. See http://www.omg.org/.
10.Altran UK.2011. The INFORMED design method for SPARK. See http://docs.adacore.com/sparkdocs-docs/Informed.htm .
11.McCormick JW, Chapin PC. 2015. Building high integrity applications with SPARK. Cambridge, UK: Cambridge University Press. [Google Scholar]
12.SPARK. 2014. Community site. See http://www.spark-2014.org/ (accessed 13 January 2017).
13.MISRA. 2013. Guidelines for the use of the C language in critical systems. See https://www.misra.org.uk/
14.AdaCore Inc. 2017. QGen home page. See http://www.adacore.com/qgen.
15.AdaCore Inc. 2017. CodePeer home page. See http://www.adacore.com/codepeer.
16.Chapman R, White N. 2016. Industrial experience with agile in high-integrity software development. In Developing Safe Systems: Proc. 24th Safety-Critical Systems Symp., Brighton, UK, 2–4 February 2016 (eds Parsons M, Anderson T), pp. 143–154. Safety Critical Systems Club. [Google Scholar]
17.Bergeretti J-F, Carré BA. 1985. Information-flow and data-flow analysis of while programs. ACM Trans. Program. Lang. Syst. 7, 37–61. ( 10.1145/2363.2366) [DOI] [Google Scholar]
18.Burns A, Dobbing B, Vardanega T. 2004. Guide for the use of the Ada Ravenscar profile in high integrity systems. ACM SIGAda Ada Lett. 24, 1–74. ( 10.1145/997119.997120) [DOI] [Google Scholar]
19.Schanda F, Brain M. 2012. Using answer set programming in the development of verified software In LIPIcs-Leibniz Int. Proc. in Informatics, vol. 17 Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. [Google Scholar]
20.Jackson PB, Ellis BJ, Sharp K. 2007. Using SMT solvers to verify high-integrity programs. In Proc. 2nd Workshop on Automated Formal Methods, pp. 60–68. New York, NY: ACM Press; ( 10.1145/1345169.1345177) [DOI] [Google Scholar]
21.Filliâtre JC, Paskevich A. 2013. Why3—where programs meet provers In Programming languages and systems (eds M Felleisen, P Gardner). Lecture Notes in Computer Science, vol. 7792, pp. 125–128. Berlin, Germany: Springer ( 10.1007/978-3-642-37036-6_8) [DOI] [Google Scholar]
22.Deters M, Reynolds A, King T, Barrett C, Tinelli C. 2014. A tour of CVC4: how it works, and how to use it. In Formal methods in computer-aided design (FMCAD) (eds K Claessen, V Kuncak). Piscataway, NJ: IEEE ( 10.1109/FMCAD.2014.6987586) [DOI] [Google Scholar]
23.Brain M, D'Silva V, Griggio A, Haller L, Kroening D. 2013. Deciding floating-point logic with abstract conflict driven clause learning. Form. Methods Syst. Des. 45, 213–245. ( 10.1007/s10703-013-0203-7) [DOI] [Google Scholar]
24.Ferrante J, Ottenstein KJ, Warren JD. 1987. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 319–349. ( 10.1145/24039.24041) [DOI] [Google Scholar]
25.Brain M, Tinelli C, Rümmer P, Wahl T. 2015. An automatable formal semantics for IEEE-754 floating-point arithmetic In IEEE 22nd Symp. on Computer Arithmetic, pp. 160–167. Piscataway, NJ: IEEE; ( 10.1109/ARITH.2015.26) [DOI] [Google Scholar]
26.King S, Hammond J, Chapman R, Pryor A. 2000. Is proof more cost-effective than testing? IEEE Trans. Softw. Eng. 26, 675–686. ( 10.1109/32.879807) [DOI] [Google Scholar]
27.Hall A, Chapman R. 2002. Correctness by construction: developing a commercial secure system. IEEE Softw. 19, 18–25. ( 10.1109/52.976937) [DOI] [Google Scholar]
28.RTCA-EUROCAE. 1992. Software Considerations in Airborne Systems and Equipment Certification, DO-178B/ED-12B.
29.Barnes J, Chapman R, Johnson R, Widmaier J, Cooper D, Everett B. 2006. Engineering the Tokeneer enclave protection software. In Proc. 1st IEEE Int. Symp. on Secure Software Engineering New York, NY: IEEE Press. [Google Scholar]
30.AdaCore. 2008. Tokeneer project public release archive. See http://www.adacore.com/tokeneer.
31.Woodcock J, Aydal E, Chapman R. 2010. The tokeneer experiments. In Reflections on the work of C.A.R. Hoare (eds Jones C, Roscoe AW, Wood K), pp. 405–430. London, UK: Springer; ( 10.1007/978-1-84882-912-1_17) [DOI] [Google Scholar]
32.Chapman R, Schanda F. 2014. Are we there yet? Twenty years of industrial theorem proving with SPARK. In Interactive theorem proving (eds G Klein, R Gamboa). Lecture Notes in Computer Science, Vol. 8558, pp. 17–26. Cham, Switzerland: Springer ( 10.1007/978-3-319-08970-6_2) [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[RSTA20150402C1] 1.McConnell S. 2004. Code complete, 2nd edn Redmond, WA: Microsoft Press. [Google Scholar]

[RSTA20150402C2] 2.Hoare CAR. 1985. Communicating sequential processes. Englewood Cliffs, NJ: Prentice Hall. [Google Scholar]

[RSTA20150402C3] 3.FDR. 2016 FDR4—the CSP Refinement Checker. See http://www.cs.ox.ac.uk/projects/fdr/.

[RSTA20150402C4] 4.Croxford M, Chapman R. 2005. Correctness by construction: a manifesto for high-integrity software. Crosstalk J. Def. Softw. Eng. 18, 5–8. [Google Scholar]

[RSTA20150402C5] 5.Jackson M. 2001. Problem frames: analyzing and structuring software development problems. Reading, MA: Addison Wesley. [Google Scholar]

[RSTA20150402C6] 6.Woodcock J, Davies J. 1996. Using Z: specification, refinement and proof. Englewood Cliffs, NJ: Prentice Hall. [Google Scholar]

[RSTA20150402C7] 7.Ansys. 2017 SCADE Suite home page. See http://www.ansys.com/products/embedded-software/ansys-scade-suite.

[RSTA20150402C8] 8.MathWorks. 2017 Simulink home page. See https://uk.mathworks.com/products/simulink.html.

[RSTA20150402C9] 9.OMG. 2017 UML home page. See http://www.omg.org/.

[RSTA20150402C10] 10.Altran UK.2011. The INFORMED design method for SPARK. See http://docs.adacore.com/sparkdocs-docs/Informed.htm .

[RSTA20150402C11] 11.McCormick JW, Chapin PC. 2015. Building high integrity applications with SPARK. Cambridge, UK: Cambridge University Press. [Google Scholar]

[RSTA20150402C12] 12.SPARK. 2014. Community site. See http://www.spark-2014.org/ (accessed 13 January 2017).

[RSTA20150402C13] 13.MISRA. 2013. Guidelines for the use of the C language in critical systems. See https://www.misra.org.uk/

[RSTA20150402C14] 14.AdaCore Inc. 2017. QGen home page. See http://www.adacore.com/qgen.

[RSTA20150402C15] 15.AdaCore Inc. 2017. CodePeer home page. See http://www.adacore.com/codepeer.

[RSTA20150402C16] 16.Chapman R, White N. 2016. Industrial experience with agile in high-integrity software development. In Developing Safe Systems: Proc. 24th Safety-Critical Systems Symp., Brighton, UK, 2–4 February 2016 (eds Parsons M, Anderson T), pp. 143–154. Safety Critical Systems Club. [Google Scholar]

[RSTA20150402C17] 17.Bergeretti J-F, Carré BA. 1985. Information-flow and data-flow analysis of while programs. ACM Trans. Program. Lang. Syst. 7, 37–61. ( 10.1145/2363.2366) [DOI] [Google Scholar]

[RSTA20150402C18] 18.Burns A, Dobbing B, Vardanega T. 2004. Guide for the use of the Ada Ravenscar profile in high integrity systems. ACM SIGAda Ada Lett. 24, 1–74. ( 10.1145/997119.997120) [DOI] [Google Scholar]

[RSTA20150402C19] 19.Schanda F, Brain M. 2012. Using answer set programming in the development of verified software In LIPIcs-Leibniz Int. Proc. in Informatics, vol. 17 Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. [Google Scholar]

[RSTA20150402C20] 20.Jackson PB, Ellis BJ, Sharp K. 2007. Using SMT solvers to verify high-integrity programs. In Proc. 2nd Workshop on Automated Formal Methods, pp. 60–68. New York, NY: ACM Press; ( 10.1145/1345169.1345177) [DOI] [Google Scholar]

[RSTA20150402C21] 21.Filliâtre JC, Paskevich A. 2013. Why3—where programs meet provers In Programming languages and systems (eds M Felleisen, P Gardner). Lecture Notes in Computer Science, vol. 7792, pp. 125–128. Berlin, Germany: Springer ( 10.1007/978-3-642-37036-6_8) [DOI] [Google Scholar]

[RSTA20150402C22] 22.Deters M, Reynolds A, King T, Barrett C, Tinelli C. 2014. A tour of CVC4: how it works, and how to use it. In Formal methods in computer-aided design (FMCAD) (eds K Claessen, V Kuncak). Piscataway, NJ: IEEE ( 10.1109/FMCAD.2014.6987586) [DOI] [Google Scholar]

[RSTA20150402C23] 23.Brain M, D'Silva V, Griggio A, Haller L, Kroening D. 2013. Deciding floating-point logic with abstract conflict driven clause learning. Form. Methods Syst. Des. 45, 213–245. ( 10.1007/s10703-013-0203-7) [DOI] [Google Scholar]

[RSTA20150402C24] 24.Ferrante J, Ottenstein KJ, Warren JD. 1987. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 319–349. ( 10.1145/24039.24041) [DOI] [Google Scholar]

[RSTA20150402C25] 25.Brain M, Tinelli C, Rümmer P, Wahl T. 2015. An automatable formal semantics for IEEE-754 floating-point arithmetic In IEEE 22nd Symp. on Computer Arithmetic, pp. 160–167. Piscataway, NJ: IEEE; ( 10.1109/ARITH.2015.26) [DOI] [Google Scholar]

[RSTA20150402C26] 26.King S, Hammond J, Chapman R, Pryor A. 2000. Is proof more cost-effective than testing? IEEE Trans. Softw. Eng. 26, 675–686. ( 10.1109/32.879807) [DOI] [Google Scholar]

[RSTA20150402C27] 27.Hall A, Chapman R. 2002. Correctness by construction: developing a commercial secure system. IEEE Softw. 19, 18–25. ( 10.1109/52.976937) [DOI] [Google Scholar]

[RSTA20150402C28] 28.RTCA-EUROCAE. 1992. Software Considerations in Airborne Systems and Equipment Certification, DO-178B/ED-12B.

[RSTA20150402C29] 29.Barnes J, Chapman R, Johnson R, Widmaier J, Cooper D, Everett B. 2006. Engineering the Tokeneer enclave protection software. In Proc. 1st IEEE Int. Symp. on Secure Software Engineering New York, NY: IEEE Press. [Google Scholar]

[RSTA20150402C30] 30.AdaCore. 2008. Tokeneer project public release archive. See http://www.adacore.com/tokeneer.

[RSTA20150402C31] 31.Woodcock J, Aydal E, Chapman R. 2010. The tokeneer experiments. In Reflections on the work of C.A.R. Hoare (eds Jones C, Roscoe AW, Wood K), pp. 405–430. London, UK: Springer; ( 10.1007/978-1-84882-912-1_17) [DOI] [Google Scholar]

[RSTA20150402C32] 32.Chapman R, Schanda F. 2014. Are we there yet? Twenty years of industrial theorem proving with SPARK. In Interactive theorem proving (eds G Klein, R Gamboa). Lecture Notes in Computer Science, Vol. 8558, pp. 17–26. Cham, Switzerland: Springer ( 10.1007/978-3-319-08970-6_2) [DOI] [Google Scholar]

PERMALINK

Formal verification: will the seedling ever flower?

Neil White

Stuart Matthews

Roderick Chapman

Abstract

1. Introduction and overview

2. Altran UK's principles and tools

(a). Principles

Table 1.

(b). Toolbox

(c). SPARK

Figure 1.

(d). Future research

3. Industrial projects

(a). SHOLIS

Figure 2.

(i). Context and objectives

(ii). Approach and solution

(iii). Results

(b). Global key centre

(i). Context and objectives

(ii). Approach and solution

(iii). Results

(c). Engine monitoring unit

(i). Context and objectives

(ii). Approach and solution

(iii). Results

(d). Tokeneer biometric access control

(i). Context and objectives

(ii). Approach and solution

(iii). Results

(e). iFACTS

Figure 3.

(i). Context and objectives

(ii). Approach and solution

(iii). Results

(f). Non-Altran SPARK projects

4. Adoption barriers and opportunities

5. Conclusion

Acknowledgements

Footnotes

Data accessibility

Authors' contributions

Competing interests

Funding

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases