ABSTRACT
Each year significant tax dollars are spent on the development of new technologies to increase efficiency and/or reduce costs of military training. However, there are currently no validated methods or measures to quantify the return on investment for adopting these new technologies for military training. Estimating the return on investment (ROI) for training technology adoption involves 1) developing a methodology or framework, 2) validating measures and methods, and 3) assessing predictive validity. The current paper describes a projective methodology using the Kirkpatrick framework to compare projected tangible and intangible benefits against tangible and intangible costs to estimate future ROI. The use-case involved an advanced technology demonstration in which sixty aircrew participated in a series of live, virtual, and constructive (LVC) exercises over a five-week period. Participants evaluated the technology’s potential costs and benefits according to the Kirkpatrick framework of training program evaluation, and analyses resulted in a nominal projection of $488 million dollars saved, significant enhancements in large-force proficiency, and 1.4 lives saved over ten years at an implementation rate of 0.5% of budgeted flight hours. A discussion of theoretical implications, data-based limitations, and recommendations for future research are provided.
KEYWORDS: Military training, ROI, training technology, training evaluation
What is the public significance of this article?—The intangible and tangible return on investments in military training technology is difficult to quantify, making projective return on investment (ROI) evaluations difficult. The current study proposes a projective framework for ROI evaluations and applies it to an advanced technology demonstration, resulting in projections for cost savings, increased proficiencies, and improved safety.
The importance of maintaining a technological edge in matters of national security cannot be understated. Over the past five years, the US has allocated between $52 (2020) and $72 (2017) billion toward research and development in the department of defense (Sargent, 2020; Sargent et al., 2017). Within the United States Air Force (USAF) the use of advanced technology to support the warfighter and enhance operational readiness is a focused effort of the USAF Science and Technology Strategy (United States Air Force, 2019). An integral part of this effort is an informed decision-making process regarding allocation of resources, project continuation, dissolution, and integration. Ultimately, while return on investment (ROI) is the most important variable in this decision process, calculating the ROI of a technology can only be completed once that technology is in place and the associated increases in capabilities, savings in time, material, dollars, or lives can be tallied. Unfortunately, when totaled costs far outweigh the immediate and projected benefits, the money has already been spent. What is needed is a projective technique to provide an estimation of ROI early on in the development process to ensure that, in the final calculation, the benefits far outweigh the costs.
The purpose of the current paper is to propose a framework for making projective estimations of ROI before investing in new training technology, by (1) reviewing existing relevant models of ROI, (2) proposing a projective ROI model, and (3) applying this model toward the Secure Live, Virtual, and Constructive Advanced Training Environment (SLATE) using data from a recent advanced technology demonstration (ATD) as a test case to develop concepts and highlight issues.
Literature review
Measuring the ROI of military training
The prime difficulty in measuring the ROI of military training is that the most important benefits are difficult to estimate (Moss et al., 2016). While many costs associated with training are tangible easily calculated, other costs are intangible and difficult to estimate such as lives that are lost, wear and tear on equipment, and training opportunities and efficiencies that are missed. Similarly, the benefits of new training technology are easy to quantify when focusing on dollars saved, but become more difficult to estimate when focusing on moderately tangible elements like skill proficiency, and become virtually impossible to quantify when focusing on the intangible elements that matter most such as wars that do not occur, lives not lost, buildings and economies not destroyed, and the general psychological experience of safety and security that are maintained. While the quantification of intangible costs and benefits has become a more salient issue in some ROI business models (e.g., intellectual property, branding, copyrights, and company culture (Andriessen & Tissen, 2000; Barsky & Marchant, 2000; Ratnatunga, 2002), current metrics rely solely on other financial indicators and do not readily translate to the world of military training, which focuses on readiness rather than profits. Deitchman (1988) made an early attempt at modeling the ROI of military training using a multi-step approach of large-scale computer simulations that first estimated the impact of training on proficiency and subsequent combat outcomes, and then validated the outcomes for realism and likelihood. Once these outcomes are set, the final steps (which were never completed) were to identify required training and costs, validate the model, estimate the costs of required training changes, and model the impact on future costs and outcomes to determine ROI. While his work was limited in scope and incomplete, the method represents a starting point to consider changes in tangible and intangible benefits.
Projective ROI assessments of training technology adoption in the military should consider both tangible and intangible costs and benefits in the final equation (see, Guyatt et al., 1986, for a robust review of such models in medicine), and should provide a framework for evaluation. To date, standardized models and frameworks of military training have not been developed. Perhaps the closest work was completed by Ratnatunga et al. (2004), who employed the Capability Economic Value of Intangible and Tangible Assets method to estimate capability value for internal audits and expense justification. Capabilities were identified as consisting of preparedness (readiness and sustainability) and force structure (equipment, doctrine, facilities, etc.). This work provides an excellent starting point to identify costs and benefits, including a variety of essential aspects of a strategic military unit (e.g., contractor personnel, maintenance, aircraft, facilities, etc.). While their model did address both tangible and intangible elements, the intangibles focused on “non-tradeable” assets such as external relationships and infrastructure, but not on the more impactful value of lives, safety, security, and the economic impact of a global peacetime economy.
Other relevant ROI models
The well-developed models of ROI for software process improvement (SPI) and training program evaluation frameworks provide some insights in how ROI for training technology can be projected. Because software processes go through many iterations without a time-intensive change in infrastructure, they provide an excellent starting point for identifying costs and benefits. Numerous models for capturing the ROI of software improvement have been developed and reviewed over the past 30 years (Rico, 2002; Unterkalmsteiner et al., 2011; Van Solingen, 2004), and the primary takeaways include 1) to the greatest extent possible, development and purchase costs should be standardized, and all relevant training/validation/maintenance personnel and hours should be included; 2) the life cycle of the technology should be considered both in terms of longevity, potential replacements, and discounts on an incremental approach to technology improvement; and 3) the benefits of adoption should be detailed and outlined to determine the break-even point to assure that the life cycle of the technology is longer than the timeline for the break-even point.
General training program evaluations models that focus on the benefits side of the ROI equation, have been widely used for over fifty years, and provide a framework for assessing the impact of training at multiple levels. Perhaps the first widely-used model was Kirkpatrick’s (1959, 1994) model that framed training program effectiveness within four levels: (K-1) trainee reactions regarding usefulness, enjoyment, and endorsement of the training, (K-2) trainee learning within the program of the targeted knowledge, skills, and dispositions, (K-3) behavioral changes on the job that result from participating in the training program, and (K-4) the real-world results in terms of impact on the organization and its interests. The primary benefits of using this type of model is that it provides diagnostic information regarding inefficiencies within a training program, so program improvement efforts can be focused and lean.
Subsequent models reflect Kirkpatrick’s framework and either add details within existing levels, such as separating level 1 (trainee reactions) into ‘inputs’ and ‘processes’ (Kaufman & Keller, 1994); or add additional levels, such as Phillip’s ROI Model which explicitly included a fifth level that evaluated the ROI of a training program after completion (Phillips & Phillips, 2006). Other models shift the emphasis from measuring the effectiveness and impact of existing training programs to an earlier stage when training programs are being developed or designed to align with strategic priorities (Anderson, 2007; Warr et al., 1970). Waag and Bell (1997) proposed a five-stage model of measuring training benefits in a military context, and are more nuanced in their treatment of behavior and results, addressing issues with programs that are directed toward seldom-occurring events (in peacetime especially, much military training is never actually used), and the fact that real-world behavior and results are seldom directly observable. Of these, the Kirkpatrick model remains the most applicable and flexible in regard to the current context and will be used going forward.
Developing a projective evaluation framework
Among the ROI models considered, five elements are useful when projecting the ROI of training technology. (1) Both tangible and intangible costs and benefits should be considered in the final analysis, but calculated independently so contextual weightings can be applied, and adjusted for the probability of success. (2) Costs should consider purchase price, material, personnel, labor, maintenance, training, and licensing fees; and these should be standardized when possible. (3) The life cycle of the technology should be considered to determine whether the ‘break-even point’ will be achieved before the life cycle ends. (4) Strategic priorities and goals should factor heavily into the final weightings and analysis and must be done on a case-by-case basis, as they can vary dramatically in different contexts (e.g., wartime or peacetime). (5) Determinations of training technology value should be performed at multiple levels including trainee reactions, trainee learning, transfer to on-the-job behavior, and real-world outcomes (which in military contexts is a mix between training and deployment, both in peacetime and wartime).
The proposed projective ROI model defines tangible costs and benefits as those that are directly measurable (e.g., past and future dollars spent or saved) and those not directly measurable but typically quantifiable (e.g., training proficiency). Intangible costs and benefits include those elements of ROI that are not easily defined or measured (e.g., opportunity cost, threat deterrence, or lives lost or saved) yet are the true ‘bottom-line’ elements training is concerned with. Equation 1 below is the simplest generalizable form of how ROI could be calculated under current considerations. Note that all benefits are adjusted for the probability that the new technology will be successfully developed and integrated, accepted, and endorsed into its immediate and distal training environments (as opposed to a new technology that is successfully implemented but not accepted and consequently never used). The value of Psuccess is an estimate based on the maturity of the technology and reactions from relevant parties regarding the potential use of the technology and can be applied holistically or selectively. Psucces may also be applied to different costs that might arise if the technology fails and would be applied as 1-Psucces. Note that equation 1 is more accurately thought of as a series of equations of dissimilar units (equation 2), where TB represents tangible benefits, IB represents intangible benefits, TC represents tangible costs, IC represents intangible costs, d represents dollars, s represents safety, and pr represents proficiency.
(1) |
(2) |
While it might seem desirable to attempt to reduce all benefits to dollars, it ignores different dimensions of both costs and benefits. For example, a new technology may save two lives annually, and a monetary reduction of the costs and benefits could compare the monetary costs to the savings resulting from not needing to train and support two new employees, plus the additional benefits and settlements paid to survivors of the deceased, minus the savings involved in promoting a lower-paid individual to take over the duties of the deceased, etc. Although a common practice, this approach fails to capture the full ROI of saving two lives each year, which has great value even if additional costs are required. Ultimately decision-makers have to determine the resources that can be justified to ensure greater safety, which requires weighting the relative values of different metrics.
Weighting each element within a single ROI equation provides an adjustment for each element’s importance or value within the context under consideration. For example, if we are considering costs in units of time, there are obvious tangible costs for the labor hours included in total development time, but each of these might have different weights within the context it is being applied. Suppose a technology requires 2,000 labor hours but can only be tested within a five-hour window each week due to the availability of resources or required personnel. In this case the resulting ‘time to completion’ is not simply a factor of labor hours but rather other constraints. Such constraints may result in too large of a cost, due to faster competitors, current needs, or diminishing future relevance. Adding weights is even more important when adjusting across different metrics, as equating dollars saved through fewer material resources or labor hours is difficult to put on a universal scale with serious accidents or deaths that are prevented. It should be noted that the value of the weight is specific to the context in which the model is being applied, as represented in a form of the general equation 3.
(3) |
Applying the Kirkpatrick framework to tangible and intangible costs and benefits
The following section describes how the Kirkpatrick framework can be integrated into estimations of tangible and intangible costs and benefits. Each component of ROI has different applicability to Kirkpatrick’s levels of evaluation and will be described in a moderate amount of detail in the following sections. In each case the examples provided are essential but not exhaustive, and serve as a primer for the use-case to which the model will be applied.
Tangible costs across Kirkpatrick levels 1-4 (K1-K4)
Tangible costs cover all materials, research, development, training, acquisition, licensing, implementation, and maintenance activities, as well as the amount of time that development and implementation will require. Tangible K-1 costs involve negative trainee reactions, which can strongly influence the probability of success. New systems, even effective ones, often fail because of user opposition, and must be factored into ROI projections. Tangible K-2 costs involve negative learning, fewer training opportunities, lower standards, and decreased proficiency, and are especially salient in technologies that are designed to increase efficiency but reduce costs. Tangible K-3 costs are typically restricted to money, time, and productivity, and can include additional hours worked, added required travel and communication, and all other modifications that result in increased dollars, task inefficiencies, internal conflicts, decreases in safety and increases in accidents. Finally, tangible K-4 costs can include total dollars spent, increases in required force size, debts incurred, and declines in success rates.
Intangible costs across Kirkpatrick levels 1-4
Intangible costs are more difficult to conceptualize and quantify. Intangible K-1 costs include implicit biases and unwanted associations that are not directly observable or even self-reportable. Intangible K-2 costs include misconceptions or hidden inefficiencies that a new training technology promotes. Intangible K-3 costs include increases in risky behavior regarding safety, security, and integrity. Intangible K-4 costs are of the greatest interest and involve the non-monetized negative impact of lives lost, injuries incurred, damage to public trust, and degraded relationships associated with choosing the technology under consideration.
Tangible benefits across Kirkpatrick levels 1-4
Tangible K-1 benefits include trainee buy-in, which is based on positive expectations and experiences, and are easily measured through observation and self-reports. Tangible K-2 benefits include dollars saved, increases in proficiency ceilings, faster skill acquisition, improved training capabilities, increased training authenticity, and longer skill retention. Tangible K-3 benefits include improvements in day-to-day protocols, behavioral efficiencies, removal of unneeded requirements or internal sources of conflict, and more efficient and productive practices in general. Tangible K-4 benefits include the bottom-line impact of the technology on personnel and material costs, force size, safety, time requirements, and operational effectiveness.
Intangible benefits across Kirkpatrick levels 1-4
Like intangible costs, intangible benefits are much harder to describe and quantify. Intangible K-1 benefits are the cultural changes that occur that have a positive impact on the functioning of the organization and the daily lives of personnel. Intangible K-2 benefits include increases in transferability on knowledge, incidental learning, and deeper levels of understanding and implementation that are not directly measured or measurable by existing methods. Intangible K-3 benefits, Intangible K-4 benefits include fewer deaths resulting from training and deployment, enhanced threat deterrence that prevents conflict and the resulting economic development and growth in primary, secondary, and tertiary arenas; along with the reduction of dollars and materials spent on collateral damage to relations, property, and the environment.
Steps in applying the projective ROI framework
The application of these elements to any specific technology involves five steps: (1) articulate whether the technology is intended to decrease, maintain, or improve existing capabilities; (2) identify which tangible and intangible costs are relevant, and either document or estimate the total cost in each category, discounted for relevant existing costs that would be removed; (3) identify which tangible and intangible benefits are relevant, and estimate the ultimate realized benefits within a meaningful timeframe; (4) estimate the probability of success to adjust benefits (mature technology with high user acceptance sets Psuccess = 1.0, and is adjusted downward with any identified issues); (5) benefits and costs are sorted by units to make direct comparisons and weights are applied accordingly to make a final determination.
Application of the projective ROI framework to a use-case
To illustrate how real-world data populates the model, the current work applies this projective ROI framework to the SLATE architecture to estimate its future ROI. SLATE is an architecture that allows live aircraft to interact with virtual (manned) and constructive (unmanned) entities to enhance training in a complex LVC environment, with the potential to augment training and increase readiness at a greatly reduced cost. The methods section that follows describes the data and ATD in general detail to describe the types of data that are typically collected and to provide some information regarding the scope of the ATD. Because the main purpose of this section is to demonstrate how data from an evaluation populates the model, much of the data has been compressed and described broadly, but can be made available upon request and approval. The results section applies the different levels of the model to a real-world context in order to highlight various model elements, strengths, weaknesses, and recommendations.
Methods
Participants
A total of sixty-six Operational F-15E and F-18EF aircrew from the 422nd Test and Evaluation Squadron, 389th Fighter Squadron, 64th Aggressor Squadron, and 59th Test and Evaluation Squadron participated. Of these participants, sixty responded to survey questions across each of three phases (phase 1, n = 9; phase 2, n = 12, and phase 3, n = 39), although missing data resulted in unequal sample sizes across different measures. In total, fifty-three participants completed the demographics survey.
Data collection instruments
Data collection involved the use of four different surveys that contained a mix of rating-scale items and open-ended responses. A 27-item demographics survey was administered during each phase, along with an open-ended ‘Top 3 and Bottom 3’ survey, which was coded for affective and utility reactions learning, future behavioral changes, and real-world impacts. A reaction survey was also administered to assess reactions across levels 1–3 of Kirkpatrick’s framework, and employed Likert-type items on a four-point scale of strongly disagree (1) to strongly agree (4). The reaction survey was modified and streamlined between phase 2 and phase 3 events, but still covered the same areas. Full surveys and/or more detailed descriptions are available upon request and approval.
SLATE evaluation demonstration procedure
The SLATE ATD took place at Nellis AFB, NV over five weeks in three phases between June and September of 2018. Phase one and phase three each lasted two weeks, and phase two lasted only one week. Operational aircrew participated in 97 exercises consisting of 642 LVC weapons events to verify the overall architecture, test new software, and test the limits of the SLATE architecture. Phase one involved F-15Es performing simple test and intercept sorties versus live and constructive targets to validate the technical performance of the SLATE pod, ground architecture, and aircraft avionics functionality. Phase two missions increased in tactical complexity and number of aircraft involved in each exercise. Phase three, culminated in a capstone event with 16 live aircraft, four virtual cockpits, and varying numbers of live and virtual air-to-air and surface-to-air threats in complex tactical missions or intercept sorties in which live aircraft detected, targeted and engaged live, virtual, and/or constructive aircraft. Participants completed the aforementioned written surveys before and after each SLATE event. General demographics were given pre-event and training reactions and open-ended questionnaires were given after each ATD phase event. Additional information regarding costs and benefits was obtained from program managers and research personnel in a post-hoc manner.
Results
This section populates the projective model with collected data to estimate ROI, and is organized around the five steps outlined in the introduction of this paper. It should be noted that the purpose of this application is to provide a robust representation of the ROI of the technology under consideration (SLATE) and not to test any specific hypotheses regarding the changes in capability or proficiency. Due to space restrictions, only data summaries are provided.
Step 1: Identification of the intent of technology
The technology was designed to increase training capabilities and reduce costs, making capabilities and costs trade-offs a moot point. Ultimately, the training technology would provide greater training opportunities for large force missions, decrease damage and maintenance on actual aircraft, and increase overall proficiency and readiness across multiple airframes.
Steps 2-3: Identification and calculation of tangible and intangible costs and benefits
Kirkpatrick level 1: Trainee reactions
The primary purpose of measuring trainee reactions is to adjust the probability of the technology’s projected acceptance and identify additional costs and/or benefits. Trainee reactions were examined through targeted survey items and open-ended response questions. Survey and commentary data consistently suggested the tangible K-1 benefits of positive affect (acceptance) and high levels of utility, but also represented potential tangible K-1 costs (rejection) if some technical issues were not addressed, indicating that the technology had not completely matured. These reactions are foundational to setting the probability of success in the final analysis based on how it converges with other data. Intangible K-1 costs and benefits were not evident in the analysis, as there was no evidence of implicit biases against the product, nor was there any evidence to suggest that the technology would result in a positive cultural shift.
Kirkpatrick level 2: Learning
Projective estimates of trainee learning were gathered from quantitative analysis of survey items, and qualitative data analysis was conducted for open-ended responses. Tangible K-2 costs were identified as potential negative training if technological issues were not resolved. If issues are resolved, data suggest a tangible K-2 benefit that learning will be significantly enhanced, especially for large-force training (e.g., “The ability to have 3 types of entities on the network gives multiple training opportunities and fighting against ‘realistic’ threats”). The ability to provide trainees with more repetitions of the skills and tactics they will need in the field, especially when it comes to large-scale force employment, is essential for the development of proficiency (Chi et al., 2014; Ericsson et al., 1993). The greater flexibility to deliver training that can modify force sizes, weapons configuration, capabilities, terrain, and other mission-essential variables gives warfighters valuable experiences that could not be gotten otherwise, in greater number and variety, at a lower cost. In short, according to learning theory, the SLATE technology would have a profound impact on warfighter skill and readiness. Intangible costs were not evident in the data or in relevant theory, as no misconceptions or hidden inefficiencies were evident. A possible intangible benefit is additional training opportunities at the individual or force-wide level resulting from the reallocation of resources, shorter feedback loop, and higher repetitions.
Kirkpatrick level 3: Behavioral changes
Projective estimates of SLATE’s potential impact on day-to-day behavior were gathered from targeted survey items and qualitative analysis of open-ended responses. Tangible and intangible K-3 costs and benefits were not clearly evident from the data. Inconsistencies existed between the survey data and comments, with participants on the average ambivalent about whether or not they could get the same experience SLATE provided at their home base, while simultaneously commenting on the enhanced training capabilities that SLATE provided. It may be that participant’s training role, technical difficulties, home base capabilities, or the survey items may have led to conflicting survey responses, but this inconsistency cannot be resolved with existing data. Responses do indicate that they will train differently if the technology is realized, but the extent to which it differs from current training is not easily determined.
Kirkpatrick level 4: Real-world results
Projections of K-4 tangible costs were obtained through consultation with SMEs, project managers, and public documents. Tangible K-4 costs were identified as $52,000,000 of testing and development of the SLATE technology up to the point of the demonstration. Extension of SLATE through the next phase of development was estimated at $110,000,000, with unit costs declining as more units are produced, and covers implementation and infrastructure development at two airbases, although some infrastructure cost could potentially be spread among other training programs and requirements to reduce costs unique to SLATE.
Projections of tangible K-4 benefits involve several different calculations, the first of which calculates the hourly costs of live aircraft that are replaced by virtual and constructive entities in a given training configuration. The participants of the phase three capstone training event and associated costs are listed in Table 1, which indicates that one hour of the large-scale SLATE phase three capstone results in a savings of $175,000 in operational costs, based on the capstone configuration and 2021 Department of Defense (DOD) reimbursable rates rounded to the nearest thousand. Savings would vary according to the aircraft that are represented, and the relatively modest savings of replacing a live F-16 ($8,000 per hour) with a virtual presence is nearly sextupled if the replaced aircraft is an F-22 ($47,000 per hour).
Table 1.
Cost savings per hour using the SLATE technology (2021 DOD reimbursable rates).
Source | Count | Type | Direct Operating Costs | Cost per hour | |
---|---|---|---|---|---|
Blue (friendly) | 6 | F-15C | $24,000 | $144,000 | |
2 | F-18 | $11,000 | $22,000 | ||
2 | EA-18 G | $11,000 | $22,000 | ||
Red (enemy) | 4 | F-16 | $8,000 | $32,000 | |
2 | F-18 | $11,000 | $22,000 | ||
Total Cost Live | $242,000 | ||||
Virtual | 2 | F-16 | $8,000 | $16,000 | |
2 | F-18 | $11,000 | $22,000 | ||
Constructive | 2 | F-35 | $18,000 | $36,000 | |
12 | Su-271 | $8,000 | $96,000 | ||
5 | SAMs | $1,000 | $5,000 | ||
Total Savings Per Hour of Virtual/Constructive Craft | $175,000 |
Note: 1The hourly cost of the Su-27 was not available and was replaced with F-16 costs, which are assumed to be similar.
Secondary considerations of K-4 tangible benefits can be attributed to the extension of the lifespan of various aircraft in terms of replacement and maintenance. Aircraft are typically commissioned with an 8,000 flight-hour life cycle, although these have been extended for the A-10, F-15, and F-16 (Cancian, 2019; Insinna, 2017). Given the unit cost of each aircraft type, a rough per-hour savings of roughly $9,000 per aircraft can be realized (see, Table 2). While unit costs, service hours, and operational costs vary, the additional savings for the 18 virtual and constructive aircraft involved in the SLATE capstone total roughly $108,000 per hour, making the total savings $283,000 per hour. How, when, and where SLATE could be used is beyond the scope of this paper, but rough numbers can be used to project overall cost savings, either in replacing existing selective training hours or increasing training capabilities without additional costs. For example, assuming the that SLATE could reduce operational flights and associated maintenance by .5%, based on the budget numbers of $53.133 billion allocated to the USAF for operation and maintenance in 2021, the annual savings would be $265 million. Examining flight hours alone, which were set at roughly 800,000 for the USAF in 2021(Department of the Air Force, 2020), the .5% reduction in flight hours would result in annual savings of $32 million. Note that these numbers are very rough estimates, as figures and values change with markets, usage, incidents, and cost formulations, and do not reflect a precise dollar amount.
Table 2.
Lifecycle savings for different aircraft per hour of substitution.
Aircraft Type | Unit Cost (in 2021 dollars)1 | Life Cycle Hours | Savings per hour |
---|---|---|---|
A-10 | $16,000,000 | 12,000 | $1,333.33 |
F-16 | $28,000,000 | 12,000 | $2,333.33 |
F-15 | $51,000,000 | 12,000 | $4,250.00 |
F-18 | $67,000,0002 | 7,500 | $8,933.33 |
F-22 | $193,000,000 | 8,000 | $24,125.00 |
F-35 |
$110,000,0003 |
8,000 |
$13,750.00 |
Average Savings Per Hour of Virtual/Constructive across aircraft type (unweighted) | $9,120.83 |
Note: 1Unit costs were obtained from The United States Air Force Aircraft Fact Sheets, and converted to 2021 dollars as needed, except for the F-18 and the F-35; 2America’s Navy, 2021; 3Grazier, 2020; Office of the Under Secretary of Defense, 2021; United States Air Force, 2005; United States Air Force, 2015a; United States Air Force, 2015b; United States Air Force, 2015c.
Another potential K-4 tangible benefit is the cost savings associated with serious accidents. United States military training accidents have resulted in 5,120 deaths between 2006 and 2020, and the USAF has reported 72 aviation-related accidents that resulted in injury or death in the 2020 fiscal year alone (Mann & Fischer, 2020), resulting in seven deaths and six serious injuries. If flight hours are kept up to maintain pilot readiness, replacing extraneous ‘training-aid’ aircraft with virtual or constructive entities would likely reduce the number of accidents each year. Additionally, fourteen aircraft were destroyed during 2020, totaling well over $700 million in replacement costs (Mann & Fischer, 2020). While flight hours are necessary, one could expect a proportional reduction in material losses due to accidents if some training, especially for ‘supportive aircraft,’ was switched from live to virtual, such that a .5% reduction in live flight would equate to an additional $3.5 million saved annually in aircraft costs when considering only class A accidents (where damages exceed $2 million or death/permanent injury results), not counting pension, legal, settlement, and personnel replacement costs.
There are two important K-4 intangible benefits to consider: 1) fewer lives lost though training and operational flight and 2) the impact of increased proficiency on capability, threat deterrence, and overall combat success. Aviation is a relatively dangerous occupational field and accidents occur at a significant rate. Between 2013 and 2018, 168 crew were killed in aviation accidents for an average of 28 deaths per year (Copp, 2018). Additional costs include permanent disability, time off due to injury, and long-term health impacts not classified as disabilities that significantly impact quality of life. Given that SLATE would result in fewer craft in the air, one can project fewer deaths proportional to the reduction in flight hours. This, however, is an oversimplification of what causes accidents, as fewer live-training hours can lower readiness to a dangerous degree, resulting in tangible and intangible costs that run parallel to the potential tangible and intangible benefits. Safe numbers of live-flight hours would need to be maintained, and SLATE would be used to supplement training in aircrew that had already met safety minimums, with a reduction in accidents likely being modest given live flight requirements.
The increased intangible K-4 benefits resulting from improved proficiency that would result from SLATE training specifically involves large-force exercises, and has a direct impact on force capabilities, threat deterrence, and operational success within this context. While the increase in proficiency is difficult to quantify, Kirkpatrick reaction, learning, and behavioral change data indicate that improvements would be significant and serve to deter conflicts and increase the chances of success when they did occur. The precise estimation of proficiency gains and the resulting impact on real-world forces are beyond the scope of this paper, which simply seeks to include this projection into the final analysis.
Step 4: Adjusting for the probability of success
A review of capabilities indicates that the training technology is close to maturation. Participant responses indicate several important issues regarding spatial positioning, communication lags, and some potential for negative training if these elements are not corrected. Current progress on SLATE has been paused in the USAF, but has continued in the US Navy F-18E, F-18 F, and F-18 G platforms, which will engage in another demonstration in the fall of 2021. While engineering success is not guaranteed, it is highly likely. Precisely how likely is a quantification that would to be estimated by engineers, and is beyond the scope of this paper; which will use a probability of .95 will as a nominal value. Projections from Kirkpatrick level-1 data suggest trainee reactions would be completely positive if the technical issues were addressed, and the value of Psuccess need not be adjusted to consider user acceptance.
Step 5: ROI component summaries by unit of measurement
The final analysis involves three distinct units of measurement that must be considered and weighted for relative importance: dollars, proficiency, and safety. It is essential to note that an accurate and full estimate of the overall impact on strategic priorities, vulnerabilities, and operations in general would involve classified information, and is well beyond the scope of this paper. Instead, the framework is meant to be applied broadly to call attention to each element within the overall analysis as a means to determine how much each matters, and whether it is justified by the associated costs.
Regarding dollars, the Kirkpatrick level-4 analysis suggests that the dollar-ROI summary should compare the cost of 192 million against the potential savings in three areas: hourly operational savings (estimated to be between $10,000 (A-10) and $47,000 (F-22) per hour, per aircraft), prevention against damage (estimated at $3.5 million for a .5% reduction in accidental class-A damages), and personnel costs that amount to continued pension payments, litigation, and settlement fees, which can range widely depending on the circumstances. Ultimately, decision-makers would identify likely values based on current trends and future predictions to estimate likely values for the top of the equation. For example, a target allocation of .5% of the 800,000 live flight hours to virtual and constructive formats would save between $44 (all A-10/F-16) and $176 (all F-22) million annually depending on which aircraft were involved. Likewise, the same .5% reduction would be expected to have a proportional impact on accidents ($3.5 million annual estimate, as stated above), as well as additional personnel training, benefits, legal fees, and settlements; all reduced by 5% to account for the probability of success (see, equation 4). Filling in these numbers with a nominal value of a .5% allocation of 4,000 annual flight hours (whether they are reallocated or represent cost savings of providing enhanced capability) would result in an annual savings of roughly $63 million based on the SLATE capstone configuration alone, with an additional $3.5 million in savings regarding damages, and additional personnel cost savings. These figures suggest that the SLATE investment would break even in roughly three and a half years.
(4) |
Changes in proficiency cannot be easily equated in terms of dollars, but must be considered as well. As stated, the primary impact that SLATE would have on proficiency concerns large-scale force exercises, and it should be noted that data collected regarding Kirkpatrick levels 1–3 indicate that training and proficiency will both improve, as crew will have vastly more experience with large-scale exercises, so long as technological issues are addressed. Because large-scale exercises are dangerous and very costly, they are very infrequent, with pilots typically engaging in large-scale exercise once or twice a year. Making this training a routine part of monthly exercises would have a profound impact on their proficiency, as most pilots are only approaching the steep portion of the learning curve with current practice schedules. Because any true wartime deployment would involve significant large-scale exercises, the importance of providing this type of training is even higher, as it will greatly enhance mission success and overall safety. If technological issues are not fully addressed and the technology is unsuccessful, there is chance that proficiency could be lessened due to negative training (see, equation 5). In this case, it is clear that the potential proficiency gains far outweigh to proficiency costs.
(5) |
Finally, changes in safety must be included as a non-monetized element when calculating the final ROI. The dangerous nature of military aviation and associated operations is a deep part of the organization, the personnel, and the families they go home to. Making the military as safe as possible is essential for public trust, high-quality workers, and global perceptions of capability. The SLATE technology’s potential impact is magnified because it affords very complex rehearsal with multiple fighters in a 3-dimensional space with a greatly reduced risk of accidents in training, which then scale to deployment during wartime. The potential reduction in fatalities, physical injury, psychological injury, and instability in the workforce be considered as an essential part of the final ROI analysis (see, equation 6). Note that the tangible and intangible costs for safety are weighted for the potential failure of the technology that could result in negative training that leads to greater fatalities, physical injuries, and psychological injuries. Using the .95 probability of success with a .5% change in flight hours would result in one death being prevented every seven years. Additional safety savings regarding major and minor injuries would likely be realized at a greater rate given their higher frequency, although injury data disaggregated by position could not be obtained.
(6) |
A true integration across different metrics would require assigning concrete values where possible and extrapolating to other metrics as appropriate, ultimately assigning weights to each area based on the context in which the technology would be applied. From the nominal values used, the ten-year ROI of implementing SLATE would be roughly $488 million, greatly increased wartime proficiency and readiness for large-force exercises, 1.4 lives saved and a meaningful reduction in major and minor injuries.
Discussion
The difficulty in measuring the value of training and training technology is not a new problem, and contextual elements of precisely how much an incremental increase in proficiency matters keeps this issue from being fully resolved. Estimating whether a new training technology will generate a positive ROI in a military context is even more challenging when one considers that military decision-makers have the difficult task of creating the cut-points for allowable costs of safeguards, procedures, and resources that are directed toward potential attacks, emergencies, accidents, and wars, making the final tabulation of benefits difficult to estimate. To meet these challenges, the current paper provides a framework for projective assessments of ROI and applies that framework to a use-case involving the SLATE technology during a recent ATD. The resulting estimation of future ROI identified significant financial, proficiency, and safety gains that would be realized with a modest technology adoption of .5% over ten years.
More important than the specific findings from the use-case data regarding SLATE are the intricacies and contextual nature of populating the projective framework with actual data. Three major implications for future work arise: (1) the importance of between and within-metric weightings, (2) metric and weight standardization, and (3) data collection tools and data quality. Regarding weights, Deitchman (1988) recognized that proficiency valuations are highly contextual and changes in proficiency need to be tied to the resulting outcomes as an anchor. We emphasize that the weighting of various costs and benefits is important not only across metrics (e.g., lives and dollars), but is essential within metrics to identify meaningful thresholds where small incremental changes have disproportionate impact on outcomes – something not explored in our use-case but essential for accurate ROI projections. Regarding standardization and the recommendations from more mature models, metrics (types and categories of tangible and intangible costs and benefits) and weights should be standardized as much as possible, although the highly contextual nature of thresholds precludes inter-domain standardization to a large degree. However, within-domain or within-program standardization is an essential element of accurate ROI projections, and should be undertaken as a matter of course, guided by the strategic priorities that were emphasized by Anderson (2007) and Warr et al. (1970); as well as adjustments for frequency or likelihood of occurrence in line with the considerations mentioned above and discussed by Bell and Waag (1998). Finally, as projective ROI probability calculus becomes more complex, it will be essential to build in high-quality data collection streams to meet computational and modeling needs. Decisions are only as good as the data upon which they are based, and the development of high-quality measures, seamless data collection systems, and modeling engines will be required. Accurate model specification and metric validation, along with estimates of predictive validity headline the list of future needs.
There were several additional limitations that became evident in applying the framework to the SLATE technology that are can serve as lessons-learned for future applications. First, data collected at levels K-1 to K-3 were based on a small sample of individuals with different roles and different experiences during training, and missing data was frequent. Disaggregating the data according to specific experiences would be much more informative across all levels of the Kirkpatrick framework and serves as a reminder to emphasize appropriate data collection procedures, even though it can be challenging in some operational settings. Second, the limited data provided cost and benefit information from a relatively small number of perspectives. Additional information could have been obtained from other members along the chain of command, such as engineers, and theorists. Third, data irregularities, changing survey instruments, and different phase-specific goals, made aggregation across different phases more difficult, and sample sizes were insufficient to complete the statistical survey validation. Fourth, some tangible costs were difficult to quantify as due to joint funding and application. Similarly, costing information changes from year to year and according the calculations that are used or approved. For example, some hourly costing estimates for the F-22 consider simple fuel costs, while others also consider maintenance personnel, facilities, time, and development, along with vehicle wear and tear across the service life of the aircraft. Because service life is subject to change (i.e., from 8,000 to 12,000 hours in the case of the F-16; Vats, 2017), so are cost estimates. A standardized approach for costing should be developed, similar to the SPI models where common costs and rates become uniform across applications for common tangible results. It is important to note that the framework was applied to an emerging technology in the middle of its development, after significant investments had been made but before it was fully developed. It is recommended that this framework be applied continuously throughout the development process from initial proposal to implementation. Projective assessments of ROI can create greater efficiencies in how training technology dollars are spent. As the framework is applied to different contexts, it is likely that common intangible categories will emerge for costs and benefits to facilitate the application of the framework to additional domains. In a broader context the framework affords a better estimation of the value of the technology that is woven into our public and private lives so we can make better decisions of what we should invest in, actively use, and remove. In the military context, it not only allows us to use the limited funds more effectively, but will help us better understand the true value of training technology and training in general.
Acknowledgments
The content and perspectives expressed in this paper are the author’s and does not necessarily represent the views of the sponsoring organization. The work was funded by the U.S. Air Force Research Laboratory under contract #47QFLA-19-0012. The authors wish to thank Alexxa Bessey and Jessica Cortez for their significant contributions.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data Availability Statement
The data on which this work is based belongs to the United States Air Force, and may be made available upon request and required approvals. https://www.afrl.af.mil/Questions/
References
- America’s Navy . (2021). F/A-18A-D Hornet and F/A-18E/F super hornet strike fighter. https://www.af.mil/About-Us/Fact-Sheets/Display/Article/104490/a-10c-thunderbolt-ii/
- Anderson, V. (2007). The value of learning: A new model of value and evaluation. Chartered Institute of Personnel and Development. [Google Scholar]
- Andriessen, D., & Tissen, R. (2000). Weightless weight—Find your real value in a future of intangible assets. Pearson Education. [Google Scholar]
- Barsky, N. P., & Marchant, G. (2000). The most valuable resource-measuring and managing intellectual capital. Strategic Finance, 81(8), 58. https://www.proquest.com/docview/229741115 [Google Scholar]
- Bell, H. H., & Waag, W. L. (1998). Evaluating the effectiveness of flight simulators for training combat skills: A review. The International Journal of Aviation Psychology, 8(3), 223–242. [Google Scholar]
- Birkey, D. (2020). Sorry, Elon, fighter pilots will fly and fight for a long time. Defense News. https://www.defensenews.com/opinion/commentary/2020/03/02/sorry-elon-fighter-pilots-will-fly-and-fight-for-a-long-time/
- Cancian, M. (2019). US military forces in FY 2021. Rowman and Littlefield. https://csis-website-prod.s3.amazonaws.com/s3fs-public/publication/210318_Cancian_Military_Forces.pdf
- Chi, M. T., Glaser, R., & Farr, M. J. (Eds.). (2014). The nature of expertise. Psychology Press. [Google Scholar]
- Copp, T. (2018). As fatal aviation crashes reach 6-year high, Pentagon says ‘this is not a crisis.’ Military Times. https://www.militarytimes.com/news/your-military/2018/05/06/as-fatal-aviation-crashes-reach-6-year-high-pentagon-says-this-is-not-a-crisis/ [Google Scholar]
- Deitchman, S. J. (1988). Preliminary exploration of the use of a warfare simulation model to examine the military value of training. Institute for Defense Analysis. https://apps.dtic.mil/sti/pdfs/ADA195751.pdf [Google Scholar]
- Department of the Air Force . (2020). Fiscal year (FY) 2021 budget estimates. https://www.saffm.hq.af.mil/Portals/84/documents/FY21/OM_/ACTIVE/FY21%20Air%20Force%20Operation%20and%20Maintenance%20Overview%20Exhibits.pdf
- Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363. 10.1037/0033-295X.100.3.363 [DOI] [Google Scholar]
- Grazier, D. (2020). Selective arithmetic to hide the F-35ʹs true costs. POGO. https://www.pogo.org/analysis/2020/10/selective-arithmetic-to-hide-the-f-35s-true-costs/ [Google Scholar]
- Guyatt, G. H., Tugwell, P. X., Feeny, D. H., Haynes, R. B., & Drummond, M. (1986). A framework for clinical evaluation of diagnostic technologies. CMAJ: Canadian Medical Association Journal, 134(6), 587. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1490902/pdf/cmaj00114-0029.pdf [PMC free article] [PubMed] [Google Scholar]
- Insinna, V. (2017). Boeing hits back on F-15C retirement proposal. Defense News. https://www.defensenews.com/air/2017/04/21/boeing-hits-back-on-f-15c-retirement-proposal/
- Kaufman, R., & Keller, J. M. (1994). Levels of evaluation: Beyond Kirkpatrick. Human Resource Development Quarterly, 5(4), 371–380. 10.1002/hrdq.3920050408 [DOI] [Google Scholar]
- Kirkpatrick, D. L. (1959). Techniques for evaluating training programs. Journal of ASTD, 11, 1–13. [Google Scholar]
- Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels. Oakland, CA: Berrett-Koehler. [Google Scholar]
- Mann, C. T., & Fischer, H. (2020). Federation of American scientists. Trends in Active-Military Deaths Since 2006. Congressional Research Service. https://fas.org/sgp/crs/natsec/IF10899.pdf [Google Scholar]
- Moss, J. D., Brimstin, J. A., Champney, R., DeCostanza, A. H., Fletcher, J. D., & Goodwin, G. (2016, September). Training effectiveness and return on investment: Perspectives from military, training, and industry communities. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 60, No. 1, pp. 2005–2008). Sage CA: Los Angeles, CA: SAGE Publications. [Google Scholar]
- Office of the Under Secretary of Defense . (2021). FY 2021 Reimbursable Rates Fixed Wing. https://comptroller.defense.gov/Portals/45/documents/rates/fy2021/2021_b_c.pdf
- Phillips, P. P., & Phillips, J. J. (2006). Return on investment (ROI) basics. American Society for Training and Development. [Google Scholar]
- Ratnatunga, J. (2002). The valuation of capabilities: A new direction for management accounting research. Journal of Applied Management Accounting Research, 1(1), 1–15. https://www.researchgate.net/profile/Janek-Ratnatunga/publication/237322123_The_Valuation_of_Capabilities_A_New_Direction_for_Management_Accounting_Research/links/5519eef90cf26cbb81a2b514/The-Valuation-of-Capabilities-A-New-Direction-for-Management-Accounting-Research.pdf [Google Scholar]
- Ratnatunga, J., Gray, N., & Balachandran, K. R. (2004). CEVITA™: The valuation and reporting of strategic capabilities. Management Accounting Research, 15(1), 77–105. 10.1016/j.mar.2003.12.005 [DOI] [Google Scholar]
- Rico, D. F. (2002). Software process improvement: Modeling return on investment (ROI). 2002 Software Engineering Institute (SEI) Software Engineering Process Group Conference (SEPG 2002), Phoenix, Arizona. [Google Scholar]
- Ristick, C. (2020). F-22 Raptor vs. F-35 lightning. Military Machine. https://militarymachine.com/f-22-raptor-vs-f-35-lightning-ii/
- Sargent, J. (2020). Federal research and development (R&D) funding: FY2020. Congressional Research Service. [Google Scholar]
- Sargent, J., Esworthy, Harris, L., R., Johnson, J., Monke, J., Morgan, J., & Upton, H. (2017). Federal research and development (R&D) funding: FY2017. Congressional Research Service. [Google Scholar]
- United States Air Force . (2005). F-15 Strike-Eagle. https://www.af.mil/About-Us/Fact-Sheets/Display/Article/104499/f-15e-strike-eagle/
- United States Air Force . (2015a). A-10C Tunderbolt II. https://www.af.mil/About-Us/Fact-Sheets/Display/Article/104490/a-10c-thunderbolt-ii/
- United States Air Force . (2015b). F-16 Fighting Falcon. https://www.af.mil/About-Us/Fact-Sheets/Display/Article/104505/f-16-fighting-falcon/
- United States Air Force . (2015c). F-22 Raptor. https://www.af.mil/About-Us/Fact-Sheets/Display/Article/104506/f-22-raptor/
- United States Air Force . (2019). Science and Technology Strategy. Strengthening USAF science and technology for 2030 and beyond. https://cdn.afresearchlab.com/wp-content/uploads/2019/01/13192817/Air-Force-Science-and-Technology-Strategy.pdf
- Unterkalmsteiner, M., Gorschek, T., Islam, A. M., Cheng, C. K., Permadi, R. B., & Feldt, R. (2011). Evaluation and measurement of software process improvement—a systematic literature review. IEEE Transactions on Software Engineering, 38(2), 398–424. 10.1109/TSE.2011.26 [DOI] [Google Scholar]
- Van Solingen, R. (2004). Measuring the ROI of software process improvement. IEEE Software, 21(3), 32–38. 10.1109/MS.2004.1293070 [DOI] [Google Scholar]
- Vats, R. (2017). Air Force extends F-16 fighter’s service life. Air force times. https://www.airforcetimes.com/news/your-air-force/2017/04/12/air-force-extends-f-16-fighter-s-service-life/
- Waag, W. L., & Bell, H. H. (1997). Estimating the training effectiveness of interactive air combat simulation. Armstrong Lab Williams AFB AZ Aircrew Training Research DIV. https://apps.dtic.mil/sti/pdfs/ADA459625.pdf [Google Scholar]
- Warr, P., Bird, M., & Rackham, N. (1970). Evaluation of management training: A practical framework, with cases, for evaluating training needs and results. Gower Press. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data on which this work is based belongs to the United States Air Force, and may be made available upon request and required approvals. https://www.afrl.af.mil/Questions/