Administrative records-based criterion measures

Martin C Yu; Matthew C Reeder; David Dorsey; Matthew T Allen

doi:10.1080/08995605.2022.2063614

. 2022 May 31;35(4):351–363. doi: 10.1080/08995605.2022.2063614

Administrative records-based criterion measures

Martin C Yu ¹, Matthew C Reeder ¹, David Dorsey ¹, Matthew T Allen ^1,^✉,^*

PMCID: PMC10291909 PMID: 37352447

ABSTRACT

This paper covers considerations in using criterion measures based on administrative data. We begin with a conceptual framework for understanding and evaluating administrative criterion measures as “objective” rather than (ratings-based) assessments of job performance. We then describe the associated advantages (e.g., availability) and disadvantages (e.g., contamination) of using administrative data for criterion-related validation purposes. Best practices in the use of administrative data for validation purposes, including procedures for (a) handling missing data, (b) performing data checks, and (c) reporting detailed decision rules so future researchers can replicate the analyses are also described. Finally, we discuss “modern data management” approaches for improving administrative data for supporting organizational decision-making.

KEYWORDS: Administrative records, archival data, criterion measurement, outcome measures, modern data management

What is the public significance of this article?—When conducting studies to answer key questions—such as whether a pre-employment test predicts performance on the job—organizations often rely on administrative records to measure key outcomes. However, there are many pitfalls and considerations when deciding to use administrative records for this purpose. The current study provides best practice guidelines for researchers when using administrative records and describes applications of Modern Data Management approaches to improving the quality of administrative records data.

An earlier paper in this special issue (Identification and Evaluation of Criterion Measurement Methods; Allen et al., 2022) summarized and evaluated available measurement methods typically used by U.S. military services for criterion measurement in validation studies. In their review, the authors identified four broad categories of criterion measurement methods: (a) performance rating scales, (b) performance measures, (c) self-report surveys, and (d) administrative records. The four categories build off the criteria originally developed in support of the Joint Performance Measurement (JPM) program – a DoD-wide study to examine the utility of the Armed Services Vocational Aptitude Battery (ASVAB) for making selection and classification decisions (Knapp & Campbell, 1993; Wigdor & Green, 1991). While extant research has extended the other three categories, the most substantial change between the JPM program and today is in the administrative records category. Administrative records were not a large part of the JPM research for several reasons, such as the lack of data availability in centralized systems and the fact that many administrative-based criterion measures (e.g., attrition, promotion rate) were thought to be irrelevant to the ASVAB (Knapp & Campbell, 1993).

It is increasingly common for military services (and organizations more generally) to rely on administrative records for criterion measurement, due to convenience and the increasing availability of such data in centralized, discoverable data systems. Despite their increased use, the perils and pitfalls of using administrative data for criterion-related validation (or human capital-related research more generally) are underappreciated or not well understood (Oswald et al., 2020). Additionally, best practices for understanding and using such data as criterion measures in validation studies are not well understood. Finally, in the Allen et al. (2022) review paper, criterion measures based on administrative records (specifically, attrition, re-enlistment, promotion rate, school grades, and performance scores) received comparatively lower scores on “psychometric quality” than other measurement methods. Although several factors went into those evaluations, a key reason for those low ratings was data quality issues.

To address these observations, we take a past, present, and future-oriented approach. We begin with a discussion of historical research related to administrative criterion measures (i.e., the past), with particular emphasis on discussions of “objective” versus ratings-based criterion measurement approaches. This brief literature review contributes to a set of pros and cons related to the use of administrative records for criterion measurement. We continue by describing best practices in the treatment and use of criterion measures based on administrative data (i.e., the present). This section includes actionable information for military research practitioners in the proper evaluation, use, and maintenance of administrative records for research purposes. Finally, we conclude with a brief description of modern data management (MDM) approaches to improving the quality of administrative data for use in organizational decision making (i.e., the future). The discussion of MDM approaches is distinct from the best practices section in that it focuses on handling potential data issues “upstream” before they get into a potential analyst’s hands. The purpose of this section is to provide researchers with an understanding of MDM techniques to work with stakeholders and policymakers in improving data quality, leading to more confidence in decisions made based on those data.

Background research on administrative records-based criterion measures (past)

It is not uncommon for data from administrative or archival sources to be used in applied research. Particularly for criterion-related validation efforts, administrative data are sometimes used to obtain access to variables that serve as primary outcomes or are used to complement other types of criterion data (e.g., performance rating scales or self-report survey outcomes; see, Allen et al., 2022 for other types of criterion measures). When evaluating the utility of incorporating outcome data from administrative sources into a study, it is important to consider the types of constructs typically captured in these sources, statistical properties of the data, and factors pertaining to psychometric adequacy (e.g., reliability, construct validity). Research on objective criteria, which characterize many outcomes found in administrative sources, provides a useful starting point for thinking about these issues.

Properties of objective criteria

One of the primary means of categorizing criteria in applied research is the distinction between objective criteria and ratings-based criteria. Ratings-based (i.e., “subjective”) criteria, such as performance rating scales, developed for criterion-related validation studies are typically constructed to capture evaluations of employee behavior. Objective (i.e., results-oriented or “hard”) criteria, on the other hand, reflect countable results, events, or outcomes as opposed to behavior, per se (Bommer et al., 1995; Heneman, 1986; Hoffman et al., 1991). Many types of outcome measures obtained through administrative sources could be characterized as objective criteria. Examples include tenure, promotions, or salary (e.g., Trevor et al., 1997); quantity or productivity indices such as sales volume, number of units produced, or profit (e.g., Deadrick & Madigan, 1990; Ghiselli & Haire, 1960; Hofmann et al., 1993); measures of quality or accuracy (e.g., Hoffman et al., 1991; Sackett et al., 1988); training outcomes (e.g., Rush, 1953; Ryman & Biersner, 1975); accidents, injuries, and other safety-related outcomes (e.g., Rhodes, 1983; Seashore et al., 1960); withdrawal outcomes such as absences, attendance, and attrition (e.g., Bass & Turner, 1973; Ilgen & Hollenback, 1977; Seashore et al., 1960); and disciplinary incidents, reprimands, or corrective action (e.g., Hogan & Hogan, 1989; Truxillo et al., 1998). See Allen et al. (2022) for descriptions of the administrative criteria most relevant in military contexts.

Compared to ratings-based measures, objective criteria are less frequently determined by job analysis. Rather, they are generally adopted and developed by the host organization because of their relevance to broad organizational objectives and obtained for validation research through access to archival sources such as personnel records, work logs, and the like (Austin & Villanova, 1992; Hoffman et al., 1991). Although termed “objective” criteria, all outcome measures inherently contain some degree of subjectivity (Nathan & Alexander, 1988). Sources of potential subjectivity in objective criteria include (a) initial determination of the importance and use of the metric, (b) application of standards for defining and distinguishing levels of performance (e.g., adequate, poor), and (c) judgments involved in the underlying process that produced the data (e.g., determination of turnover reason, judgments associated with servicemember rewards or recognition).

Several statistical and psychometric characteristics of objective criteria have been discussed in the literature. Relative to performance ratings, objective measures tend to capture more specific, narrower aspects of job performance (Bommer et al., 1995). In terms of construct substance, objective criteria typically assess constructs more closely associated with technical performance or task performance as opposed to constructs rooted in contextual performance or citizenship (Sturman, 2003). That said, the wide array of measures included under the umbrella of objective criteria also suggests these measures are construct heterogeneous and should not be assumed to be interchangeable in terms of the inferences they support.

Questions have been raised about the validity of objective and ratings-based criteria in terms of construct contamination and deficiency. Although ratings are subject to their own validity issues (e.g., influence of systematic rater error on ratings; Heneman, 1986), objective criteria and ratings are likely influenced by different sources of contamination and deficiency. As previously mentioned, objective measures tend to be specific in nature. Consequently, objective criteria may be construct deficient if the intent is to use them as indicators of an overall domain of job performance (e.g., Bass & Turner, 1973; Bommer et al., 1995; Nathan & Alexander, 1988). Concerning contamination, objective measures are also subject to extraneous variables such as environmental factors and situational constraints that may introduce variance beyond that attributable to individual efforts (Cascio & Valenzi, 1978; Heneman, 1986; Nathan & Alexander, 1988).

Finally, events captured by various objective criteria often produce binary (e.g., attrition: 0 = stay, 1 = separate) or count data (e.g., number of disciplinary incidents), which inherently bring into question the applicability of statistics or models that assume normality. A closely related issue is that events captured by various objective criteria tend to be relatively infrequent for many jobs, occupations, and industries (e.g., in many industries, a variable like accidents tends to be very skewed, where only a minority of respondents may experience a focal event), leading to low base rates (e.g., Austin & Villanova, 1992; Harrison & Hulin, 1989; Steel & Griffeth, 1989). A consequence of extreme base rates is that outcome variance is restricted and estimates of statistics such as correlations can be attenuated (McGrath & Meyer, 2006). To address these and related methodological issues (e.g., time-to-occurrence data, scale truncation), relatively newer techniques such as survival analysis models (e.g., Cox proportional hazards regression) and models applicable to count data (e.g., Poisson or negative binomial models) have been successfully employed (see, Harrison & Hulin, 1989; Hughes et al., 2020; Neal & Griffin, 2006;; Newman et al., 2008 for examples).

Research on objective criteria

Several streams of research have examined the efficacy of objective criteria in criterion-related validation research. One relatively longstanding area of study pertains to research on convergent validity between objective measures and ratings (e.g., Hoffman et al., 1991; Rush, 1953; Seashore et al., 1960). An important question inherent in this research pertains to the extent to which objective measures and ratings can be assumed as interchangeable for research or decision-making purposes. A general conclusion from meta-analytic studies (Bommer et al., 1995; Heneman, 1986) is that objective measures and ratings are positively correlated, that properties of both methods can affect the magnitude of the relationship (e.g., rating method format, type of objective measure) and, when the constructs match across measurement methods, the relationship can be fairly strong in magnitude. Specifically, Bommer et al. (1995) reported a corrected correlation of .706 between objective criteria and ratings assessing production quantity. However, correlations between objective criteria and ratings are typically not so high as to suggest that one method can be assumed substitutable for the other in terms of rank-ordering individuals.

A related avenue of study has examined the predictability of objective criteria vis-à-vis ratings in criterion-related validation contexts. This has come in the form of studies explicitly designed to compare criterion measures as well as research on the criterion-related validity of different types of predictors (e.g., personality measures). Several meta-analyses on predictors of job performance have examined predictor-outcome relationships by separate classes of criteria (e.g., Barrick & Mount, 1991; Hunter & Hunter, 1984;; Schmidt & Hunter, 1998; Schmitt et al., 1984; Tett et al., 1991; Vinchur et al., 1998) as have meta-analyses examining the relationship between other demographics characteristics such as age and race/ethnicity and performance (e.g., Ford et al., 1986; McEvoy & Cascio, 1989; Rhodes, 1983; Roth et al., 2003). Taken together, these results suggest that administrative records can have utility in criterion-related validation, but primarily as a complement to other criterion measures.

In addition to meta-analytic studies focusing on other predictors or demographic variables, individual studies and meta-analyses have also set out to explicitly compare relationships involving different types of criteria (some that would be considered objective and others that would be considered more subjective) and variables such as individual-difference predictors (e.g., Hoffman et al., 1991; Nathan & Alexander, 1988) or situational constraints (Steel & Mento, 1986; Steel et al., 1987; Villanova & Roman, 1993). After meta-analyzing studies examining the predictability of various criteria based on various abilities deemed relevant for clerical jobs, Nathan and Alexander (1988) concluded that four of five criteria examined (ratings, rankings, work samples, and measures of production quantity) were highly predictable across various abilities (although corrected validity estimates appeared systematically larger for rankings and work samples vs. ratings and measures of quantity in a few instances, e.g., general mental ability, verbal ability, quantitative ability). Of the criteria examined, the objective measure “quality” had the lowest validity coefficients.

A final area of research on objective criteria in criterion-related validation pertains to longitudinal studies of how performance outcomes vary over time. This work has demonstrated how correlations between repeated criterion measures generally decrease as the time interval between measurements increases, resulting in a simplex-like structure indicating the rank-order of individuals changes over time (e.g., Deadrick & Madigan, 1990; Henry & Hulin, 1987; Hofmann et al., 1993). Sturman et al. (2005) conducted a meta-analysis examining objective and subjective criteria over time that involved disentangling the effects of systematic and unsystematic error. Among other findings, results suggested that objective measures may have lower test-retest reliability than subjective measures, particularly for highly complex jobs. Finally, Hofmann et al. (1993) went beyond examining correlations and aggregated results for describing individual-level change over time and demonstrated that linear mixed models can be used to model and predict individual differences in performance trajectories. Studies examining trajectories based on objective criteria subsequently followed (e.g., Deadrick et al., 1997; Ployhart & Hakel, 1998; Zyphur et al., 2008).

Practical considerations in using administrative data

Decisions around using data from administrative data sources involve balancing the potential benefits and costs. Based on the above review and our own direct experience, criterion data obtained from administrative sources have several benefits, including:

Convenience – The availability of criterion data in archival sources can greatly limit time and financial resources required to collect data by capitalizing on what is already available;
Face validity, acceptance, and translation – Objective criteria often have relevance and buy-in among stakeholders compared to rating scales (e.g., saying that salespeople meeting a given threshold on a predictor earn $1.5 million more on average than those who do not is generally more meaningful than saying the same group of salespeople score .75 rating scale points higher on average); and
Best available measure – For many outcomes (e.g., turnover), administrative records can be the best available source of information.

On the other hand, there are also several drawbacks associated with using administrative or archival data. Beyond some of the limitations noted above, these include:

Signal/noise tradeoff – Whereas rating scales are intended to assess behavior directly, many administrative criteria (especially among the objective criteria noted above) are indicators that, as mentioned, may often be highly influenced by factors outside of the employee’s control, leading to criterion contamination;
Deficiency – Data based on administrative records tend to be specific in nature, and thus deficient when used without any additional data for criterion-related validation;
Questions about quality – In many cases, the nature of administrative data may limit the extent to which it is possible to evaluate them in terms of statistical or psychometric properties, leaving important questions about their quality partially or entirely unknown, or may have undesirable properties, such as heavy skew or low base rates;
Documentation – Related to the prior point, documentation around administrative data is often scant and, because said data may not have originally been prepared for use in validation research, any available documentation may not be adequate for addressing questions that might arise; and
Data sensitivity – Finally, depending on the nature of the data under consideration, there may be restrictions or limitations arising from data sensitivity or legal restrictions (e.g., classification of data, Health Insurance Portability and Accountability Act [HIPAA] regulations, the European Union General Data Protection Regulation [GDPR] compliance requirements, or other concerns around litigation).

Regardless of motivation, military researchers are often faced with the prospect of using administrative records for criterion-related validation purposes. We outline best practices for doing so in the next section.

Best practices in preparing administrative data (present)

As described previously, because administrative data are typically not designed for specific research purposes, but rather for reasons such as personnel management or general record-keeping, they often require examination and manipulation to be appropriate for criterion-related validation. Additionally, they may contain errors and extraneous information. Therefore, steps must be taken to understand, clean, and organize the data for research and analysis purposes (Connelly et al., 2016). These steps include: (a) obtain data documentation, (b) evaluate data quality and clean data, (c) handle multiple records and duplicates, (d) construct analysis variables, (e) merge datasets, and (f) create data documentation. We describe each of these steps in additional detail below.

Obtain data documentation

The first step upon receipt of administrative data records is to fully understand how those data were initially collected and how the data are formatted and structured. Data documentation may be available in a variety of forms – ideally as formal written data documentation but may also include personal communications with the data originators. Such information will be necessary to determine if the data are suitable for the research question at hand and to inform subsequent steps of quality evaluation and data cleaning (Connelly et al., 2016).

Data collection forms or protocols

Documentation should be obtained regarding who or what is included in the data to ensure that the data represent the population of interest for the research question at hand. Documentation should also be obtained regarding how the data were originally collected to inform evaluations of data quality (George & Lee, 2001). For example, when reporting occupation/specialty, was the value selected from a drop-down list of all possible values, or was it entered as text? Evaluation of data quality would be different depending on the answer. For the former, the list needs to be confirmed to truly have contained all possible occupations/specialties, but for the latter, possible data entry errors will need to be checked. Additionally, if data collection protocols have changed over time, it will be necessary to understand changes between versions and how they may have affected the data structure or content.

Data dictionaries or codebooks

To understand the raw data, it will be necessary to have well-defined variables and value labels where applicable. For example, if gender is coded in the data with values of “M” and “F,” it is easy to determine that they correspond to “Male” and “Female,” respectively. However, if it is coded as say “1” and “2,” it will be necessary to know what each value corresponds to. Codebooks should also include information about missing data, such as whether missing values are coded as blanks or if they are assigned certain values (e.g., a negative value for missing data on time in service). In cases where data protocols have changed over time, codebooks should be obtained for each version of data.

Past research reports

If formal data collection protocols or codebooks are inadequate or unavailable, one supplementary source of information could come from past research reports using the same administrative data records (e.g., reports of previous criterion-related validation studies), where available. In addition to providing supplementary data documentation, data management steps described in past reports can help to inform what steps may be necessary for current or future efforts with the same data.

Personal communications

Even if formal written data documentation is available, there is a high likelihood that questions regarding the data will arise. Questions should be fielded to the data originator(s), but be prepared to identify and reach out to others, especially if the individual who provided the data extract is not the same as the individual who owns the data or the individual who initially collected the data. Especially when one is attempting to work with administrative data from multiple sources, a combination of institutional and personal knowledge will likely provide the greatest clarity on administrative data matters.

Evaluate data quality and clean data

The purpose of preparing and cleaning administrative data is to ensure data quality by addressing as many sources of errors as possible. The “total survey error” paradigm (Groves et al., 2009) describes several sources of error that should be evaluated, including measurement error, processing error, nonresponse error, adjustment error, coverage error, and sampling error. Coverage and sampling errors tend to be less prevalent in administrative data as administrative data are typically not sampled data (Groen, 2012), but their possible presence should still be evaluated to ensure data quality. Each of these errors will be discussed in turn, serving as a procedural framework for evaluating administrative data quality and conducting data cleaning processes.

Measurement error

It is possible that the data recorded in an administrative record may not reflect the truth. As described above in the drop down vs. hand-entered occupation/specialty example, this could be a result of data entry errors and the form of data entry error will differ depending on the format of the data. These errors are frequently unintentional but can also be intentional. For example, leaders can be motivated to distort information that impacts servicemember careers (e.g., performance ratings, codes used to identify reasons for separation). Another way in which measurement error can manifest in administrative data relates to longitudinal data that can change over time. If an individual’s occupation/specialty has changed, but the most recent administrative record has not yet reflected that change, then that record will be incorrect. Another common example of a longitudinal change is name changes due to marriage. In such cases, it is important to determine the date by which the data were last updated as it will not necessarily be the same as the date on which the data were obtained or extracted.

Treatment of measurement error is generally approached in one of two ways. One is to compare the values in the data against values in the data codebook, identifying data values that are out of range or otherwise not listed in the codebook. Some errors are easily identifiable, especially minor spelling errors (e.g., a military occupation/specialty stated as “Ordinance” clearly refers to “Ordnance”). However, other errors may require investigation. For example, a test item may consist of four response options, and a particular database codes each option as valid values from 1 to 4 but a value of 5 appears in the data. It is possible that the value of 4 was incorrectly programmed to be a 5, in which case it should be recoded to 4. However, an alternative possibility could be that the 5 represents a missing response code but this was not documented in the codebook. Perhaps it was implemented as a new coding procedure, but the codebook was not updated, in which case other sources including personal communications may need to be consulted.

The second approach to treat measurement error is to compare multiple data sources for the same variable to identify discrepancies. Where discrepancies occur, it will be necessary to understand the quality of each source based on the available data documentation. This should include evaluating the recency of the data, susceptibility to data entry errors (e.g., selection from a drop-down list is less prone to typographical errors than text entry), and mode of data collection (e.g., HR records should be more reliable than survey data). Discrepancies should then be reconciled by retaining the value from the data source determined to be of the highest quality.

If neither approach can be used to determine what the correct value should be, erroneous values should be set to missing as a last resort. This can take the form of blank values, or missing data codes. Missing data codes may be preferable over blanks in cases when there may be other reasons for missingness on a variable. For example, if a gender variable contains values of “M” = “Male,” “F” = “Female,” O = “Other,” and “-6” = “Prefer not to respond,” an additional value of “-9” could be used to signify missingness due to measurement error.

Processing error

After data are initially encoded, steps may have been taken to process the data into a form suitable for the database at hand. Errors may arise at this phase if the processing steps incorrectly altered the original data. A simple example relates to inconsistent data formatting where some identification variables such as social security number can start with zero. If the data field is defined to be numeric, it is possible that leading zeroes may be inadvertently dropped. This can be corrected by converting the data field to a character or string field and appending leading zeros such that the social security number is nine digits long.

A more complex example is when a predefined data code needs to be selected to represent a narrative response, such as with separation codes issued by a Defense Department component upon a service member’s separation from active duty. Here, the authority completing the form needs to select a separation code that best represents the separation narrative. If the separation narrative is not well defined, it may be difficult to determine the most relevant separation code and a less representative code may end up in administrative separation records. Alternatively, if multiple separation codes could apply to a given narrative, the ability to only select a single code would result in a deficient representation of separation in the administrative records.

Identifying and correcting processing errors depends on having the initial data available for comparison. If only the separation code is available in an administrative data file, it would only be possible to determine its relevance to the separation narrative if the original narrative data could also be obtained. Assuming the original data are available, some processing errors may be identifiable through programmatic means such as using keywords related to “physical fitness” to detect separation reasons due to failure to meet physical fitness requirements. On the other hand, some errors may require manual review and a judgment will need to be made regarding the likelihood of error versus the effort needed to identify and correct the error. In cases where the prerequisite data for identifying processing errors are not available, or where the return on investment for identifying processing errors is judged to be low, the possibility of processing errors may be considered within an acceptable margin of error.

Nonresponse error

Missing data may occur for a variety of reasons, including respondents not entering data either intentionally (e.g., declines to report demographic information such as gender) or unintentionally (e.g., skipped a question by accident), technical errors, and censored (i.e., has not had the opportunity to be observed) data (e.g., data for second term of service would not be available for a service member currently in their first term). Missing data may be handled by imputing from other available data sources. If gender is missing from an administrative dataset, it may be obtained from another administrative (e.g., HR) database. If other data sources are not available, appropriate values for filling in missing data could be obtained via predictive methods (e.g., multiple imputation; Newman, 2014). Predictive imputation, which always includes some margin of error, trades off a reduction in nonresponse errors against prediction errors in cases where the predicted values differ from actual values (Groen, 2012).

Coverage error

Coverage errors occur when the available data do not reflect the target population of interest for the research question. For example, if the research question pertains only to first-term enlisted personnel, the data should either only include records for first-term enlisted personnel, or should include a grouping variable that can be used to identify records of those who are not. Exclusion criteria should also be obtained, such as if records older than a certain timeframe are no longer updated or are dropped from the data altogether. Corrections for coverage errors will typically involve reducing a broader dataset to the population of interest or identifying and requesting the appropriate dataset to replace an inappropriate dataset.

Sampling error

Because administrative data should include all available data within specific population and timeframe parameters rather than being a sample of the available data, sampling error should be unlikely (Groen, 2012). One basis for sampling may be that the total amount of data is unwieldy. The ideal solution would be to utilize database management and computational approaches that are suited for large datasets. If sampling must be done, then it should be a random sample to ensure representativeness to the population. Bootstrapped sampling where analyses are aggregated across large numbers of smaller random samples can be used to ensure the robustness of results from sampled data.

Handle multiple records and duplicates

Besides the aforementioned errors, multiple records may exist for an individual, for both legitimate and erroneous reasons. The easiest case to handle is one in which there is an exact duplicate of data due to a technical issue, where the exact duplicate record can simply be deleted. Another case is when there are multiple records for an individual, each representing a new data update. Based on available timestamps for each record, the decision regarding whether the earliest most recent, or possibly even all records should be retained would depend on the research question. For example, whether an individual failed on their first attempt at a course would be reflected in the earliest record, but whether they eventually passed would be reflected in the most recent record.

Perhaps most problematic is when duplicates exist where the temporal order cannot be confidently established. This may occur if data were originally entered incorrectly, and a new record was created with the correct data. In such cases, the duplicate records should each be examined for completeness and compared against each available other data source, when available, to determine which is most trustworthy. A flagging variable should also be created to identify these cases as potential data points to exclude for analyses if the data quality cannot be guaranteed.

Construct analysis variables

In general, researchers may be faced with administrative data that are not immediately applicable to the research question at hand. In some cases, the requisite data are simply not available (meaning that another data source will need to be obtained), but in other cases the raw data may simply need to be transformed into new data elements that are more suitable for analysis (Jones, 2010). One example is with attrition or separation, where administrative data often contain an entry date and a separation date. If the research question pertains to individuals who had separated after 6, 12, 18, or 24 months of service, four new data elements will need to be created. This can take the form of dichotomous indicators (1 = separated, 0 = not separated) for whether the difference between separation and entry dates is less than 6, 12, 18, or 24 months of service. Individuals with censored data should have those indicators coded as missing (e.g., an individual currently active with 12 months of service would be coded 0 for 6- and 12-month separation status, and “NA” or missing for 18- and 24-month separation status).

Merge datasets

Once individual datasets have been cleaned, multiple datasets may need to be merged for analyses as administrative datasets are often specific to a particular function (e.g., personnel management records, training records, and disciplinary records). The simplest case for merging is when there is a clear, deterministic linking variable such as social security number, e-mail address, or other unique identified (e.g., employee ID number). Assuming these are free of the aforementioned data quality errors, they can be used to directly link different datasets.

However, in many cases a deterministic link on a single identifying variable cannot be made (Oswald et al., 2020). A common example is name, where multiple individuals could have the same name or there are name variations (e.g., due to nicknames). In such cases, it will be necessary to conduct a probabilistic linkage procedure instead. A variety of software packages are available to conduct probabilistic linkages (e.g., Enamorado et al., 2019; Ferguson et al., 2018; Hejblum et al., 2019; Wright, n.d.). A complete discussion of these methods is beyond the scope of the current paper, but in general they apply matching or similarity algorithms that yield a probability estimate that records from different datasets belong to the same individual based on multiple pieces of demographic information. Therefore, the more identifying information that is available (e.g., date of birth, rank, or occupational code), the more likely the probabilistic linkage will approach the accuracy of a deterministic linkage.

Create data documentation

While this is the last point made in our Best Practices section, the creation of data documentation should in practice be a continual process during data preparation. All prior steps should be recorded as they occur. This includes describing (a) what prior documentation and informational sources were available; (b) the population and sample present in the data; (c) data quality issues that were identified; (d) steps taken to rectify data errors, multiple records, and duplicates; (e) procedures for constructing new analysis variables; (f) what datasets were merged and the linkage process used to merge them; and (g) a complete data dictionary or codebook for the final, complete analysis file. This documentation will help to ensure reproducibility with future iterations or updates of the data, as well as introduce and orient other potential users to the data (e.g., other researchers conducting analysis).

Modern methods in managing administrative data (future)

As highlighted throughout the previous section, it is a reality that administrative data often show up “messier” than we would like. Thus, data analysts spend a large amount of time data cleaning, data wrangling, and just making sense of such data. Behind this reality is the fact that such data often arrive in this state simply because someone allows it to be the case – many administrative systems simply lack robust data management approaches. Thus, in this section, we provide researchers with an understanding of how to improve data management. To accomplish this, we discuss how there has been a sea change in thinking about data management, and we introduce the idea of modern data management, which offers ways to gather, maintain, and use administrative data to greater effect. Such data management approaches are one facet of modern human capital analytics, which goes beyond thinking about administrative records as static and instead views data as embedded in larger data ecosystems. This view is consistent with thoughts about applications of “big data” within human resources systems (Oswald et al., 2020), and many of the relevant lessons learned apply to small data as well as big.

We begin our discussion with some definition of terms, starting with a consideration of what exactly “modern data management” (MDM, also sometimes called “data ecosystems”) entails. Although there are many different approaches, most MDM systems feature the following layers of activities (Python Predictions, 2021):

Data capture and storage – how are data captured, input, and made available in a timely, organized, and structured manner?
Data governance – how do we ensure that data are useful, well documented, and of high quality?
Data usage – how do users extract knowledge from data, and what enterprise tools, platforms, and capabilities facilitate data use?
Data impact – how do stakeholders ensure that data are used effectively for operational decision making?

As mentioned above, these definitional elements are similar to those used in discussions of “data ecosystems,” which is defined by Oliveira et al. (2019, p. 16) as “a loose set of interacting actors that directly or indirectly consume, produce, or provide data and other related resources.” It is noteworthy that in its recently drafted “DoD Data Strategy,” the U.S. Department of Defense recognizes the importance of each of these various elements.¹ Finally, we recognize that researchers may or may not have control over these activities, depending on how they are functionally situated in relation to those data. Despite this, we believe it is critical for researchers to have an understanding of MDM as a field and general MDM tenets as they apply to organizational research so they can communicate best practices to key organizational stakeholders. Furthermore, in our experience, applied researchers often do engage in projects that end up as requirements for data management systems. Thus, raising awareness of MDM practices may assist in the development of systems that aid future researchers in their quest for insights from administrative data.

Data capture and storage

Beginning with the first element, data capture and storage, researchers should consider the following important questions and issues when building MDM systems or communicating with stakeholders:

Data sources – where do data originate in terms of end point data producers and initial users? What IT systems are involved in its initial capture and storage (e.g., CRM systems such as Salesforce, enterprise resource planning [ERP] systems such as PeopleSoft, digital marketing solutions)? Without knowing the who, what, where, and when of data sources, data stakeholders will have problems mapping out downstream systems and uses. Increasingly, such data include not just structured data (e.g., performance ratings) but unstructured data such as e-mails, videos, memoranda, and so forth. In the case of collecting and using military administrative data, it is critical to figure out these early capture elements, as the relevant data can be so varied. Take the previous example of studying military attrition. Here the relevant systems and possible sources of data are diverse (including elements like screening interventions, support interventions, variates/covariates such as demographic data, and various types of outcome data including type of separation, attrition rates, time of separation; see White et al., 2014 for an example) – all potentially captured and stored in different parts of a military data ecosystem.
Data platforms – What are the primary platforms on which users interact with data? Such systems can be “on prem,” meaning located in an organization’s physical location or, more frequently these days, on either private or public cloud platforms, such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud. All these environments hold advantages and disadvantages, but again for long term administrative data health and use, it is critical that organizations plan the use of platforms thoughtfully.
Data architecture – It is not sufficient to only plan for data systems and platforms; instead, we must also consider how data are organized in underlying databases and data models. Even the best data can be rendered useless if thrown into a disorganized database model or schema. Getting into the details of modern data models is well beyond the scope of the current paper. It suffices for our current purposes to note (a) the importance of data models and (b) that it is critical to know and document the ways in which data are structured for storage and use, documentation that ideally includes a data inventory or catalog that references data location and other facets such as “metadata” (frequently described as “data that describes other data”). The days in which we only had access to RDBMS (Relational Database Management System) tools to structure data are past, and we now have access to various database and data model technologies that afford advances in scalability, flexibility around data types, and real-time access.
Data pipelines – In today’s complex IT architectures, it is increasingly important to know how to move data from one system to another. These processes are often referred to as “data pipelines.” Creating such pipelines and related workflows is critical for effective use, and there are even specialized jobs/roles for individuals who focus upon such data flows, namely Data Engineer.

Data governance

Moving on to a second major feature of MDM, let us briefly reflect upon data governance. Of the many aspects of MDM, in our experience, data governance is often one of the least emphasized elements in organizations, yet one of the most important. Communicating the criticality of data governance to stakeholders is crucial to the success of future organizational research. To begin, let us be clear that all data systems have issues. Yet, the organizations who have dedicated data quality standards, steps, and roles seem to have fewer. Often the first best step in managing data quality is to document data systems. A good place to begin with such documentation is often with a data inventory, which addresses where and how data are generated and stored. Moreover, in the progressively important world of cybersecurity, data inventories and asset inventories form the foundation of a security posture. It is hard to protect data if you do not really know where they are.

Next, data quality is not just a what but is also a who. Specifically, organizations are increasingly making use of “data stewards,” as local agents embedded throughout an organization that play specific roles in monitoring and assuring data quality (Ghavami, 2020, see Chapter 9). Data stewards often act as liaisons between organizational units and data governance authorities, and a variety of models for data stewardship exist (Plotkin, 2020).

Finally, good data governance incorporates the evolving rules, regulations, and laws around data protection. For example, the international standard, the previously-mentioned GDPR is a legal framework that sets guidelines for the collection and processing of personal information from individuals who live in the European Union, but these guidelines are more broadly impacting countries around the world. Specific to military data collection, there are of course also evolving sets of rules and regulations around data sensitivity (security classifications) and protections, such as personally identifiable information (PII). Data managers should treat such rules and regulations not as one-off guidelines but instead should integrate regulations into a comprehensive data governance system. Increasingly, organizations are appointing Data Privacy or Data Protection Officers to deal with such issues.

Data usage and impact

Regarding data usage and impact, organizations are now using MDM frameworks to identify business process owners, data owners, data stewards, various types of data scientists and analysts, in addition to other end-users, in an overall ecosystem view. As such, each participant in a data management lifecycle plays specific roles and delivers unique benefits to the organization. It is incumbent on researchers to convey these benefits to organizational stakeholders as they provide utility case for the infrastructure investments described above. The benefits of such integrated and coordinated systems include:

Improved decision making – data-informed decision making should benefit organizations across the board. Senior data scientists and other data stakeholders can specify use cases to show the benefits of MDM in terms of specific operational decision-making improvements. Given the topics mentioned throughout this paper, readers can imagine what such use cases might look like. One example is the increasing use of data-sophisticated methods for military recruiting (Lim et al., 2019).
Performance and Innovation – Research shows that organizations with mature data-informed decision-processes outperform peers across a number of organizational metrics. For example, according to McKinsey research, “organizations that leverage customer behavioral insights outperform peers by 85% in sales growth and more than 25% in gross margin” (Brown et al., 2017). Such companies also tend to generate useful innovation initiatives more quickly and more effectively. One might think that such results are confined only to private sector companies, but similar studies of government organizations suggest comparable results and benefits (e.g., Wiseman, 2018).

For organizations to execute the processes and systems involved in MDM, some degree of culture change is often required. Staff used to local methods may find it difficult to switch to modern data processes and an increased use of algorithms and automated systems. Resistance to MDM is common as organizations upgrade processes. This is where an ecosystem view is helpful; staff at all levels of organizations should be invited to better uses and management of data through the use of concrete business cases and demonstrated results. Researchers have a key role to play as end users of these data to demonstrate the return on investment (ROI) of sound data management principles. The U.S. Department of Defense clearly recognizes this cultural aspect within the DoD Data Strategy, citing that “Moving the Department to a data-centric organization requires a cultural transformation with the DoD workforce at its heart. DoD will continue to evolve its decision-making culture to one soundly based upon data and analytics enabled by technology” (p. 6).

Conclusion

The purpose of the current paper was to extend the Allen et al. (2022) paper by examining one criterion category – administrative records – in additional detail. The current paper assists military psychology researchers attempting to use administrative records-based criterion measures in their research in a few ways. First, we reviewed extant research on the use of objective/administrative data for criterion-related validation to identify potential limitations. These limitations are not meant to dissuade researchers from using administrative records, but to help them properly contextualize results based on those data when considering validation evidence. Second, we provided a set of best practices for researchers to use in preparing administrative data. These guidelines are designed to help researchers systematically evaluate, clean, transform, document, and maintain their datasets to maximize the utility of the administrative records, whatever their state, for criterion-related validation. Finally, as described early in this paper, a common limiting factor in using administrative records-based criterion measures is data quality. However, this does not have to be the case – MDM frameworks can be deployed to improve data quality and support organizational decision-making. We hope this brief overview of MDM frameworks can be used by researchers to make the case for improved data management procedures in their organizations.

Implicit in this last section is the observation that several trends are impacting the use of administrative records for organizational decision-making (to include criterion-related validation), including the emergence of “big data” and human capital analytics in organizational decision-making, and the role of data science and other disciplines in bringing new techniques for collecting, storing, and analyzing those data. Beyond the scope of the current paper are other frontiers in the use of administrative records for criterion-related validation, such as potential uses of unstructured data sources (e.g., performance review narratives; Oswald et al., 2020). While we hope this paper serves as a useful starting point, future researchers should also monitor those trends to identify new potential uses of administrative records and evaluate their efficacy for criterion-related validation in military contexts.

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Note

^1.

https://media.defense.gov/2020/Oct/08/2002514180/-1/-1/0/DOD-DATA-STRATEGY.PDF.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

No datasets were used in this study.

References

Allen, M. T., Russell, T., Ford, L., Carretta, T., Lee, A., & Kirkendall, C. (2022). Identification and evaluation of criterion measurement methods. Military Psychology, 35(4), 308–320. 10.1080/08995605.2022.2050165 [DOI] [PMC free article] [PubMed] [Google Scholar]
Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917–1992. Journal of Applied Psychology, 77(6), 836–874. 10.1037/0021-9010.77.6.836 [DOI] [Google Scholar]
Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26. 10.1111/j.1744-6570.1991.tb00688.x [DOI] [Google Scholar]
Bass, A. R., & Turner, J. N. (1973). Ethnic group differences in relationships among criteria of job performance. Journal of Applied Psychology, 57(2), 101–109. 10.1037/h0037125 [DOI] [Google Scholar]
Bommer, W. H., Johnson, J. L., Rich, G. A., Podsakoff, P. M., & Mackenzie, S. B. (1995). On the interchangeability of objective and subjective measures of employee performance: A meta-analysis. Personnel Psychology, 48(3), 587–605. 10.1111/j.1744-6570.1995.tb01772.x [DOI] [Google Scholar]
Brown, B., Kanagasabai, K., Pant, P., & Pinto, G. S. (2017). Capturing value from your customer data. McKinsey Quarterly. Retrieved May 20, 2022, from https://www.mckinsey.com/business-functions/quantumblack/our-insights/capturing-value-from-your-customer-data [Google Scholar]
Cascio, W. F., & Valenzi, E. R. (1978). Relations among criteria of police performance. Journal of Applied Psychology, 63(1), 22–28. 10.1037/0021-9010.63.1.22 [DOI] [Google Scholar]
Connelly, R., Playford, C. J., Gayle, V., & Dibben, C. (2016). The role of administrative data in the big data revolution in social science research. Social Science Research, 59, 1–12. 10.1016/j.ssresearch.2016.04.015 [DOI] [PubMed] [Google Scholar]
Deadrick, D. L., & Madigan, R. M. (1990). Dynamic criteria revisited: A longitudinal study of performance stability and predictive validity. Personnel Psychology, 43(4), 717–744. 10.1111/j.1744-6570.1990.tb00680.x [DOI] [Google Scholar]
Deadrick, D. L., Bennett, N., & Russell, C. J. (1997). Using hierarchical linear modeling to examine dynamic performance criteria over time. Journal of Management, 23(6), 745–757. 10.1177/014920639702300603 [DOI] [Google Scholar]
Enamorado, T., Fifield, B., & Imai, K. (2019). Using a probabilistic model to assist merging of large-scale administrative records. American Political Science Review, 113(2), 353–371. 10.1017/S0003055418000783 [DOI] [Google Scholar]
Ferguson, J., Hannigan, A., & Stack, A. (2018). A new computationally efficient algorithm for record linkage with field dependency and missing data imputation. International Journal of Medical Informatics, 109, 70–75. 10.1016/j.ijmedinf.2017.10.021 [DOI] [PubMed] [Google Scholar]
Ford, J. K., Kraiger, K., & Schechtman, S. L. (1986). Study of race effects in objective indices and subjective evaluations of performance: A meta-analysis of performance criteria. Journal of Applied Psychology, 99(3), 330–337. 10.1037/0033-2909.99.3.330 [DOI] [Google Scholar]
George, R. M., & Lee, B. (2001). Matching and cleaning administrative data. In Ver Ploeg M., Moffitt R. A., & Citro C. (Eds.), Studies of welfare population: Data collection and research issues (pp. 197–219). National Academy Press. [Google Scholar]
Ghavami, P. (2020). Big data management: Data governance principles for big data analytics. De Gruyter. [Google Scholar]
Ghiselli, E. E., & Haire, M. (1960). The validation of selection tests in the light of the dynamic character of criteria. Personnel Psychology, 13(3), 225–231. 10.1111/j.1744-6570.1960.tb01352.x [DOI] [Google Scholar]
Groen, J. A. (2012). Sources of error in survey and administrative data: The importance of reporting procedures. Journal of Official Statistics, 28(2), 173–198. [Google Scholar]
Groves, R. M., Fowler, F. J., Jr., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2009). Survey methodology (2nd ed.). John Wiley & Sons. [Google Scholar]
Harrison, D. A., & Hulin, C. L. (1989). Investigations of absenteeism: Using event history models to study the absence-taking process. Journal of Applied Psychology, 74(2), 300–316. 10.1037/0021-9010.74.2.300 [DOI] [Google Scholar]
Hejblum, B. P., Weber, G. M., Liao, K. P., Palmer, N. P., Churchill, S., Shadick, N. A., Szolovits, P., Murphy, S. N., Kohane, I. S., & Cai, T. (2019). Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes. Scientific Data, 6(1), 1–11. 10.1038/sdata.2018.298 [DOI] [PMC free article] [PubMed] [Google Scholar]
Heneman, R. L. (1986). The relationship between supervisory ratings and results-oriented measures of performance: A meta-analysis. Personnel Psychology, 39(4), 811–826. 10.1111/j.1744-6570.1986.tb00596.x [DOI] [Google Scholar]
Henry, R. A., & Hulin, C. L. (1987). Stability of skilled performance across time: Some generalizations and limitations on utilities. Journal of Applied Psychology, 72(3), 457–462. 10.1037/0021-9010.72.3.457 [DOI] [Google Scholar]
Hoffman, C. C., Nathan, B. R., & Holden, L. M. (1991). A comparison of validation criteria: Objective versus subjective performance measures and self- versus supervisor ratings. Personnel Psychology, 44(3), 601–619. 10.1111/j.1744-6570.1991.tb02405.x [DOI] [Google Scholar]
Hofmann, D. A., Jacobs, R., & Baratta, J. E. (1993). Dynamic criteria and the measurement of change. Journal of Applied Psychology, 78(2), 194–204. 10.1037/0021-9010.78.2.194 [DOI] [Google Scholar]
Hogan, J., & Hogan, R. (1989). How to measure employee reliability. Journal of Applied Psychology, 74(2), 273–279. 10.1037/0021-9010.74.2.273 [DOI] [Google Scholar]
Hughes, M. G., O’Brien, E. L., Reeder, M. C., & Purl, J. (2020). Attrition and reenlistment in the army: Using the Tailored Adaptive Personality Assessment System (TAPAS) to improve retention. Military Psychology, 32(1), 36–50. 10.1080/08995605.2019.1652487 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Psychological Bulletin, 96(1), 72–98. 10.1037/0033-2909.96.1.72 [DOI] [Google Scholar]
Ilgen, D. R., & Hollenback, J. H. (1977). The role of satisfaction in absence behavior. Organizational Behavior and Human Performance, 19(1), 148–161. 10.1016/0030-5073(77)90059-9 [DOI] [Google Scholar]
Jones, C. (2010). Archival data: Advantages and disadvantages for research in psychology. Social and Personality Psychology Compass, 4(11), 1008–1017. 10.1111/j.1751-9004.2010.00317.x [DOI] [Google Scholar]
Knapp, D. J., & Campbell, J. P. (1993). Building a joint-service classification research roadmap: Criterion-related issues (AL/HR-TP-1993-0028). Armstrong Laboratory. [Google Scholar]
Lim, N., Orvis, B. R., & Hall, K. C. (2019). Leveraging big data analytics to improve military recruiting. RAND. [Google Scholar]
McEvoy, G. M., & Cascio, W. F. (1989). Cumulative evidence of the relationship between employee age and job performance. Journal of Applied Psychology, 74(1), 11–17. 10.1037/0021-9010.74.1.11 [DOI] [Google Scholar]
McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: The case of r and d. Psychological Methods, 11(4), 386–401. 10.1037/1082-989X.11.4.386 [DOI] [PubMed] [Google Scholar]
Nathan, B. R., & Alexander, R. A. (1988). A comparison of criteria for test validation: A meta-analytic investigation. Personnel Psychology, 41(3), 517–535. 10.1111/j.1744-6570.1988.tb00642.x [DOI] [Google Scholar]
Neal, A., & Griffin, M. A. (2006). A study of the lagged relationships among safety climate, safety motivation, safety behavior, and accidents at the individual and group levels. Journal of Applied Psychology, 91(4), 946–953. 10.1037/0021-9010.91.4.946 [DOI] [PubMed] [Google Scholar]
Newman, S., Griffin, M. A., & Mason, C. (2008). Safety in work vehicles: A multilevel study linking safety values and individual predictors to work-related driving crashes. Journal of Applied Psychology, 93(3), 632–644. 10.1037/0021-9010.93.3.632 [DOI] [PubMed] [Google Scholar]
Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372–411. 10.1177/1094428114548590 [DOI] [Google Scholar]
Oliveira, M. I. S., Lima, G. D. F. B., & Lóscio, B. F. (2019). Investigations into data ecosystems: A systematic mapping study. Knowledge and Information Systems, 61(2), 589–630. 10.1007/s10115-018-1323-6 [DOI] [Google Scholar]
Oswald, F. L., Behrend, T. S., Putka, D. J., & Sinar, E. (2020). Big data in industrial-organizational psychology and human resource management: Forward progress for organizational research and practice. Annual Review of Organizational Psychology and Organizational Behavior, 7(1), 505–533. 10.1146/annurev-orgpsych-032117-104553 [DOI] [Google Scholar]
Plotkin, D. (2020). Data stewardship: An actionable guide to effective data management and data governance. Academic Press. [Google Scholar]
Ployhart, R. E., & Hakel, M. D. (1998). The substantive nature of performance variability: Predicting interindividual differences in intraindividual performance. Personnel Psychology, 51(4), 859–901. 10.1111/j.1744-6570.1998.tb00744.x [DOI] [Google Scholar]
Python Predictions . (2021, January 7). A brief introduction to the data ecosystem (And the crucial role of data strategy). Retrieved July 5, 2021, from https://www.pythonpredictions.com/news/a-brief-introduction-to-the-data-ecosystem-and-the-crucial-role-of-data-strategy/
Rhodes, S. R. (1983). Age-related differences in work attitudes and behavior: A review and conceptual analysis. Psychological Bulletin, 93(2), 328–367. 10.1037/0033-2909.93.2.328 [DOI] [Google Scholar]
Roth, P. L., Huffcut, A. I., & Bobko, P. (2003). Ethnic group differences in measures of job performance: A new meta-analysis. Journal of Applied Psychology, 88(4), 694–706. 10.1037/0021-9010.88.4.694 [DOI] [PubMed] [Google Scholar]
Rush, C. H. (1953). A factorial study of sales criteria. Personnel Psychology, 6(1), 9–24. 10.1111/j.1744-6570.1953.tb01027.x [DOI] [Google Scholar]
Ryman, D. H., & Biersner, R. J. (1975). Attitudes predictive of diving training success. Personnel Psychology, 28(2), 181–188. 10.1111/j.1744-6570.1975.tb01379.x [DOI] [Google Scholar]
Sackett, P. R., Zedeck, S., & Fogli, L. (1988). Relations between measures of typical and maximum job performance. Journal of Applied Psychology, 73(3), 482–486. 10.1037/0021-9010.73.3.482 [DOI] [Google Scholar]
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274. 10.1037/0033-2909.124.2.262 [DOI] [Google Scholar]
Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984). Meta-analyses of validity studies published between 1964 and 1982 and the investigation of study characteristics. Personnel Psychology, 37(3), 407–422. 10.1111/j.1744-6570.1984.tb00519.x [DOI] [Google Scholar]
Seashore, S. E., Indik, B. P., & Georgopoulos, B. S. (1960). Relationships among criteria of job performance. Journal of Applied Psychology, 44(3), 195–202. 10.1037/h0044267 [DOI] [Google Scholar]
Steel, R. P., & Mento, A. J. (1986). Impact of situational constraints on subjective and objective criteria of managerial job performance. Organizational Behavior and Human Decision Processes, 37(2), 254–265. 10.1016/0749-5978(86)90054-3 [DOI] [Google Scholar]
Steel, R. P., Mento, A. J., & Hendrix, W. H. (1987). Constraining forces and the work performance of finance company cashiers. Journal of Management, 13(3), 473–482. 10.1177/014920638701300304 [DOI] [Google Scholar]
Steel, R. P., & Griffeth, R. W. (1989). The elusive relationship between perceived employment opportunity and turnover behavior: A methodological or conceptual artifact? Journal of Applied Psychology, 74(6), 846–854. 10.1037/0021-9010.74.6.846 [DOI] [Google Scholar]
Sturman, M. C. (2003). Searching for the inverted u-shaped relationship between time and performance: Meta-analyses of the experience/performance, tenure/performance, and age/performance relationships. Journal of Management, 29(5), 609–640. 10.1016/S0149-2063(03)00028-X [DOI] [Google Scholar]
Sturman, M. C., Cheramie, R. A., & Cashen, L. H. (2005). The impact of job complexity and performance measurement on the temporal consistency, stability, and test-retest reliability of employee job performance ratings. Journal of Applied Psychology, 90(2), 269–283. 10.1037/0021-9010.90.2.269 [DOI] [PubMed] [Google Scholar]
Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44(4), 703–742. 10.1111/j.1744-6570.1991.tb00696.x [DOI] [Google Scholar]
Trevor, C. O., Gerhart, B., & Boudreau, J. W. (1997). Voluntary turnover and job performance: Curvilinearity and the moderating influences of salary growth and promotions. Journal of Applied Psychology, 82(1), 44–61. 10.1037/0021-9010.82.1.44 [DOI] [Google Scholar]
Truxillo, D. M., Bennett, S. R., & Collins, M. L. (1998). College education and police job performance: A ten-year study. Public Personnel Management, 27(2), 269–280. 10.1177/009102609802700211 [DOI] [Google Scholar]
Villanova, P., & Roman, M. A. (1993). A meta-analytic review of situational constraints and work-related outcomes: Alternative approaches to conceptualization. Human Resource Management Review, 3(2), 147–175. 10.1016/1053-4822(93)90021-U [DOI] [Google Scholar]
Vinchur, A. J., Schippmann, J. S., Switzer, F. S. I. I. I., & Roth, P. L. (1998). A meta-analytic review of predictors of job performance for salespeople. Journal of Applied Psychology, 83(4), 586–597. 10.1037/0021-9010.83.4.586 [DOI] [Google Scholar]
White, L. A., Rumsey, M. G., Mullins, H. M., Nye, C. D., & LaPort, K. A. (2014). Toward a new attrition screening paradigm: Latest army advances. Military Psychology, 26(3), 138–152. 10.1037/mil0000047 [DOI] [Google Scholar]
Wigdor, A. K., & Green, B. F. (1991). Performance assessment for the workplace, Vol. 1; Vol. 2: Technical issues. National Academy Press. [Google Scholar]
Wiseman, J. (2018). Data-driven government: The role of chief data officers. IBM Center for the Business of Government. [Google Scholar]
Wright, G. (n.d.). Probabilistic record linkage in SAS. Kaiser Permanente. [Google Scholar]
Zyphur, M. J., Bradley, J. C., Landis, R. S., & Thoresen, C. J. (2008). The effects of cognitive ability and conscientiousness on performance over time: A censored latent growth model. Human Performance, 21(1), 1–27. 10.1080/08959280701521967 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No datasets were used in this study.

[cit0001] Allen, M. T., Russell, T., Ford, L., Carretta, T., Lee, A., & Kirkendall, C. (2022). Identification and evaluation of criterion measurement methods. Military Psychology, 35(4), 308–320. 10.1080/08995605.2022.2050165 [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0002] Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917–1992. Journal of Applied Psychology, 77(6), 836–874. 10.1037/0021-9010.77.6.836 [DOI] [Google Scholar]

[cit0003] Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26. 10.1111/j.1744-6570.1991.tb00688.x [DOI] [Google Scholar]

[cit0004] Bass, A. R., & Turner, J. N. (1973). Ethnic group differences in relationships among criteria of job performance. Journal of Applied Psychology, 57(2), 101–109. 10.1037/h0037125 [DOI] [Google Scholar]

[cit0005] Bommer, W. H., Johnson, J. L., Rich, G. A., Podsakoff, P. M., & Mackenzie, S. B. (1995). On the interchangeability of objective and subjective measures of employee performance: A meta-analysis. Personnel Psychology, 48(3), 587–605. 10.1111/j.1744-6570.1995.tb01772.x [DOI] [Google Scholar]

[cit0006] Brown, B., Kanagasabai, K., Pant, P., & Pinto, G. S. (2017). Capturing value from your customer data. McKinsey Quarterly. Retrieved May 20, 2022, from https://www.mckinsey.com/business-functions/quantumblack/our-insights/capturing-value-from-your-customer-data [Google Scholar]

[cit0007] Cascio, W. F., & Valenzi, E. R. (1978). Relations among criteria of police performance. Journal of Applied Psychology, 63(1), 22–28. 10.1037/0021-9010.63.1.22 [DOI] [Google Scholar]

[cit0008] Connelly, R., Playford, C. J., Gayle, V., & Dibben, C. (2016). The role of administrative data in the big data revolution in social science research. Social Science Research, 59, 1–12. 10.1016/j.ssresearch.2016.04.015 [DOI] [PubMed] [Google Scholar]

[cit0009] Deadrick, D. L., & Madigan, R. M. (1990). Dynamic criteria revisited: A longitudinal study of performance stability and predictive validity. Personnel Psychology, 43(4), 717–744. 10.1111/j.1744-6570.1990.tb00680.x [DOI] [Google Scholar]

[cit0010] Deadrick, D. L., Bennett, N., & Russell, C. J. (1997). Using hierarchical linear modeling to examine dynamic performance criteria over time. Journal of Management, 23(6), 745–757. 10.1177/014920639702300603 [DOI] [Google Scholar]

[cit0011] Enamorado, T., Fifield, B., & Imai, K. (2019). Using a probabilistic model to assist merging of large-scale administrative records. American Political Science Review, 113(2), 353–371. 10.1017/S0003055418000783 [DOI] [Google Scholar]

[cit0012] Ferguson, J., Hannigan, A., & Stack, A. (2018). A new computationally efficient algorithm for record linkage with field dependency and missing data imputation. International Journal of Medical Informatics, 109, 70–75. 10.1016/j.ijmedinf.2017.10.021 [DOI] [PubMed] [Google Scholar]

[cit0013] Ford, J. K., Kraiger, K., & Schechtman, S. L. (1986). Study of race effects in objective indices and subjective evaluations of performance: A meta-analysis of performance criteria. Journal of Applied Psychology, 99(3), 330–337. 10.1037/0033-2909.99.3.330 [DOI] [Google Scholar]

[cit0014] George, R. M., & Lee, B. (2001). Matching and cleaning administrative data. In Ver Ploeg M., Moffitt R. A., & Citro C. (Eds.), Studies of welfare population: Data collection and research issues (pp. 197–219). National Academy Press. [Google Scholar]

[cit0015] Ghavami, P. (2020). Big data management: Data governance principles for big data analytics. De Gruyter. [Google Scholar]

[cit0016] Ghiselli, E. E., & Haire, M. (1960). The validation of selection tests in the light of the dynamic character of criteria. Personnel Psychology, 13(3), 225–231. 10.1111/j.1744-6570.1960.tb01352.x [DOI] [Google Scholar]

[cit0017] Groen, J. A. (2012). Sources of error in survey and administrative data: The importance of reporting procedures. Journal of Official Statistics, 28(2), 173–198. [Google Scholar]

[cit0018] Groves, R. M., Fowler, F. J., Jr., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2009). Survey methodology (2nd ed.). John Wiley & Sons. [Google Scholar]

[cit0019] Harrison, D. A., & Hulin, C. L. (1989). Investigations of absenteeism: Using event history models to study the absence-taking process. Journal of Applied Psychology, 74(2), 300–316. 10.1037/0021-9010.74.2.300 [DOI] [Google Scholar]

[cit0020] Hejblum, B. P., Weber, G. M., Liao, K. P., Palmer, N. P., Churchill, S., Shadick, N. A., Szolovits, P., Murphy, S. N., Kohane, I. S., & Cai, T. (2019). Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes. Scientific Data, 6(1), 1–11. 10.1038/sdata.2018.298 [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0021] Heneman, R. L. (1986). The relationship between supervisory ratings and results-oriented measures of performance: A meta-analysis. Personnel Psychology, 39(4), 811–826. 10.1111/j.1744-6570.1986.tb00596.x [DOI] [Google Scholar]

[cit0022] Henry, R. A., & Hulin, C. L. (1987). Stability of skilled performance across time: Some generalizations and limitations on utilities. Journal of Applied Psychology, 72(3), 457–462. 10.1037/0021-9010.72.3.457 [DOI] [Google Scholar]

[cit0023] Hoffman, C. C., Nathan, B. R., & Holden, L. M. (1991). A comparison of validation criteria: Objective versus subjective performance measures and self- versus supervisor ratings. Personnel Psychology, 44(3), 601–619. 10.1111/j.1744-6570.1991.tb02405.x [DOI] [Google Scholar]

[cit0024] Hofmann, D. A., Jacobs, R., & Baratta, J. E. (1993). Dynamic criteria and the measurement of change. Journal of Applied Psychology, 78(2), 194–204. 10.1037/0021-9010.78.2.194 [DOI] [Google Scholar]

[cit0025] Hogan, J., & Hogan, R. (1989). How to measure employee reliability. Journal of Applied Psychology, 74(2), 273–279. 10.1037/0021-9010.74.2.273 [DOI] [Google Scholar]

[cit0026] Hughes, M. G., O’Brien, E. L., Reeder, M. C., & Purl, J. (2020). Attrition and reenlistment in the army: Using the Tailored Adaptive Personality Assessment System (TAPAS) to improve retention. Military Psychology, 32(1), 36–50. 10.1080/08995605.2019.1652487 [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0027] Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Psychological Bulletin, 96(1), 72–98. 10.1037/0033-2909.96.1.72 [DOI] [Google Scholar]

[cit0028] Ilgen, D. R., & Hollenback, J. H. (1977). The role of satisfaction in absence behavior. Organizational Behavior and Human Performance, 19(1), 148–161. 10.1016/0030-5073(77)90059-9 [DOI] [Google Scholar]

[cit0029] Jones, C. (2010). Archival data: Advantages and disadvantages for research in psychology. Social and Personality Psychology Compass, 4(11), 1008–1017. 10.1111/j.1751-9004.2010.00317.x [DOI] [Google Scholar]

[cit0030] Knapp, D. J., & Campbell, J. P. (1993). Building a joint-service classification research roadmap: Criterion-related issues (AL/HR-TP-1993-0028). Armstrong Laboratory. [Google Scholar]

[cit0031] Lim, N., Orvis, B. R., & Hall, K. C. (2019). Leveraging big data analytics to improve military recruiting. RAND. [Google Scholar]

[cit0032] McEvoy, G. M., & Cascio, W. F. (1989). Cumulative evidence of the relationship between employee age and job performance. Journal of Applied Psychology, 74(1), 11–17. 10.1037/0021-9010.74.1.11 [DOI] [Google Scholar]

[cit0033] McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: The case of r and d. Psychological Methods, 11(4), 386–401. 10.1037/1082-989X.11.4.386 [DOI] [PubMed] [Google Scholar]

[cit0034] Nathan, B. R., & Alexander, R. A. (1988). A comparison of criteria for test validation: A meta-analytic investigation. Personnel Psychology, 41(3), 517–535. 10.1111/j.1744-6570.1988.tb00642.x [DOI] [Google Scholar]

[cit0035] Neal, A., & Griffin, M. A. (2006). A study of the lagged relationships among safety climate, safety motivation, safety behavior, and accidents at the individual and group levels. Journal of Applied Psychology, 91(4), 946–953. 10.1037/0021-9010.91.4.946 [DOI] [PubMed] [Google Scholar]

[cit0036] Newman, S., Griffin, M. A., & Mason, C. (2008). Safety in work vehicles: A multilevel study linking safety values and individual predictors to work-related driving crashes. Journal of Applied Psychology, 93(3), 632–644. 10.1037/0021-9010.93.3.632 [DOI] [PubMed] [Google Scholar]

[cit0037] Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372–411. 10.1177/1094428114548590 [DOI] [Google Scholar]

[cit0038] Oliveira, M. I. S., Lima, G. D. F. B., & Lóscio, B. F. (2019). Investigations into data ecosystems: A systematic mapping study. Knowledge and Information Systems, 61(2), 589–630. 10.1007/s10115-018-1323-6 [DOI] [Google Scholar]

[cit0039] Oswald, F. L., Behrend, T. S., Putka, D. J., & Sinar, E. (2020). Big data in industrial-organizational psychology and human resource management: Forward progress for organizational research and practice. Annual Review of Organizational Psychology and Organizational Behavior, 7(1), 505–533. 10.1146/annurev-orgpsych-032117-104553 [DOI] [Google Scholar]

[cit0040] Plotkin, D. (2020). Data stewardship: An actionable guide to effective data management and data governance. Academic Press. [Google Scholar]

[cit0041] Ployhart, R. E., & Hakel, M. D. (1998). The substantive nature of performance variability: Predicting interindividual differences in intraindividual performance. Personnel Psychology, 51(4), 859–901. 10.1111/j.1744-6570.1998.tb00744.x [DOI] [Google Scholar]

[cit0042] Python Predictions . (2021, January 7). A brief introduction to the data ecosystem (And the crucial role of data strategy). Retrieved July 5, 2021, from https://www.pythonpredictions.com/news/a-brief-introduction-to-the-data-ecosystem-and-the-crucial-role-of-data-strategy/

[cit0043] Rhodes, S. R. (1983). Age-related differences in work attitudes and behavior: A review and conceptual analysis. Psychological Bulletin, 93(2), 328–367. 10.1037/0033-2909.93.2.328 [DOI] [Google Scholar]

[cit0044] Roth, P. L., Huffcut, A. I., & Bobko, P. (2003). Ethnic group differences in measures of job performance: A new meta-analysis. Journal of Applied Psychology, 88(4), 694–706. 10.1037/0021-9010.88.4.694 [DOI] [PubMed] [Google Scholar]

[cit0045] Rush, C. H. (1953). A factorial study of sales criteria. Personnel Psychology, 6(1), 9–24. 10.1111/j.1744-6570.1953.tb01027.x [DOI] [Google Scholar]

[cit0046] Ryman, D. H., & Biersner, R. J. (1975). Attitudes predictive of diving training success. Personnel Psychology, 28(2), 181–188. 10.1111/j.1744-6570.1975.tb01379.x [DOI] [Google Scholar]

[cit0047] Sackett, P. R., Zedeck, S., & Fogli, L. (1988). Relations between measures of typical and maximum job performance. Journal of Applied Psychology, 73(3), 482–486. 10.1037/0021-9010.73.3.482 [DOI] [Google Scholar]

[cit0048] Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274. 10.1037/0033-2909.124.2.262 [DOI] [Google Scholar]

[cit0049] Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984). Meta-analyses of validity studies published between 1964 and 1982 and the investigation of study characteristics. Personnel Psychology, 37(3), 407–422. 10.1111/j.1744-6570.1984.tb00519.x [DOI] [Google Scholar]

[cit0050] Seashore, S. E., Indik, B. P., & Georgopoulos, B. S. (1960). Relationships among criteria of job performance. Journal of Applied Psychology, 44(3), 195–202. 10.1037/h0044267 [DOI] [Google Scholar]

[cit0051] Steel, R. P., & Mento, A. J. (1986). Impact of situational constraints on subjective and objective criteria of managerial job performance. Organizational Behavior and Human Decision Processes, 37(2), 254–265. 10.1016/0749-5978(86)90054-3 [DOI] [Google Scholar]

[cit0052] Steel, R. P., Mento, A. J., & Hendrix, W. H. (1987). Constraining forces and the work performance of finance company cashiers. Journal of Management, 13(3), 473–482. 10.1177/014920638701300304 [DOI] [Google Scholar]

[cit0053] Steel, R. P., & Griffeth, R. W. (1989). The elusive relationship between perceived employment opportunity and turnover behavior: A methodological or conceptual artifact? Journal of Applied Psychology, 74(6), 846–854. 10.1037/0021-9010.74.6.846 [DOI] [Google Scholar]

[cit0054] Sturman, M. C. (2003). Searching for the inverted u-shaped relationship between time and performance: Meta-analyses of the experience/performance, tenure/performance, and age/performance relationships. Journal of Management, 29(5), 609–640. 10.1016/S0149-2063(03)00028-X [DOI] [Google Scholar]

[cit0055] Sturman, M. C., Cheramie, R. A., & Cashen, L. H. (2005). The impact of job complexity and performance measurement on the temporal consistency, stability, and test-retest reliability of employee job performance ratings. Journal of Applied Psychology, 90(2), 269–283. 10.1037/0021-9010.90.2.269 [DOI] [PubMed] [Google Scholar]

[cit0056] Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44(4), 703–742. 10.1111/j.1744-6570.1991.tb00696.x [DOI] [Google Scholar]

[cit0057] Trevor, C. O., Gerhart, B., & Boudreau, J. W. (1997). Voluntary turnover and job performance: Curvilinearity and the moderating influences of salary growth and promotions. Journal of Applied Psychology, 82(1), 44–61. 10.1037/0021-9010.82.1.44 [DOI] [Google Scholar]

[cit0058] Truxillo, D. M., Bennett, S. R., & Collins, M. L. (1998). College education and police job performance: A ten-year study. Public Personnel Management, 27(2), 269–280. 10.1177/009102609802700211 [DOI] [Google Scholar]

[cit0059] Villanova, P., & Roman, M. A. (1993). A meta-analytic review of situational constraints and work-related outcomes: Alternative approaches to conceptualization. Human Resource Management Review, 3(2), 147–175. 10.1016/1053-4822(93)90021-U [DOI] [Google Scholar]

[cit0060] Vinchur, A. J., Schippmann, J. S., Switzer, F. S. I. I. I., & Roth, P. L. (1998). A meta-analytic review of predictors of job performance for salespeople. Journal of Applied Psychology, 83(4), 586–597. 10.1037/0021-9010.83.4.586 [DOI] [Google Scholar]

[cit0061] White, L. A., Rumsey, M. G., Mullins, H. M., Nye, C. D., & LaPort, K. A. (2014). Toward a new attrition screening paradigm: Latest army advances. Military Psychology, 26(3), 138–152. 10.1037/mil0000047 [DOI] [Google Scholar]

[cit0062] Wigdor, A. K., & Green, B. F. (1991). Performance assessment for the workplace, Vol. 1; Vol. 2: Technical issues. National Academy Press. [Google Scholar]

[cit0063] Wiseman, J. (2018). Data-driven government: The role of chief data officers. IBM Center for the Business of Government. [Google Scholar]

[cit0064] Wright, G. (n.d.). Probabilistic record linkage in SAS. Kaiser Permanente. [Google Scholar]

[cit0065] Zyphur, M. J., Bradley, J. C., Landis, R. S., & Thoresen, C. J. (2008). The effects of cognitive ability and conscientiousness on performance over time: A censored latent growth model. Human Performance, 21(1), 1–27. 10.1080/08959280701521967 [DOI] [Google Scholar]

PERMALINK

Administrative records-based criterion measures

Martin C Yu

Matthew C Reeder

David Dorsey

Matthew T Allen

ABSTRACT

Background research on administrative records-based criterion measures (past)

Properties of objective criteria

Research on objective criteria

Practical considerations in using administrative data

Best practices in preparing administrative data (present)

Obtain data documentation

Data collection forms or protocols

Data dictionaries or codebooks

Past research reports

Personal communications

Evaluate data quality and clean data

Measurement error

Processing error

Nonresponse error

Coverage error

Sampling error

Handle multiple records and duplicates

Construct analysis variables

Merge datasets

Create data documentation

Modern methods in managing administrative data (future)

Data capture and storage

Data governance

Data usage and impact

Conclusion

Correction Statement

Note

Disclosure statement

Data availability statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases