Strong profiling is not mathematically optimal for discovering rare malfeasors

William H Press

doi:10.1073/pnas.0813202106

. 2009 Feb 2;106(6):1716–1719. doi: 10.1073/pnas.0813202106

Strong profiling is not mathematically optimal for discovering rare malfeasors

William H Press ^1,¹

PMCID: PMC2634801 PMID: 19188610

Abstract

The use of profiling by ethnicity or nationality to trigger secondary security screening is a controversial social and political issue. Overlooked is the question of whether such actuarial methods are in fact mathematically justified, even under the most idealized assumptions of completely accurate prior probabilities, and secondary screenings concentrated on the highest-probablity individuals. We show here that strong profiling (defined as screening at least in proportion to prior probability) is no more efficient than uniform random sampling of the entire population, because resources are wasted on the repeated screening of higher probability, but innocent, individuals. A mathematically optimal strategy would be “square-root biased sampling,” the geometric mean between strong profiling and uniform sampling, with secondary screenings distributed broadly, although not uniformly, over the population. Square-root biased sampling is a general idea that can be applied whenever a “bell-ringer” event must be found by sampling with replacement, but can be recognized (either with certainty, or with some probability) when seen.

Keywords: screening, square-root biased sampling, rare events

In a large population of individuals labeled j = 1,2,…, N, governments attempt to find the rare malfeasor j = j_* (terrorist, for example, refs. 1–3)^† by assigning prior probabilities p_j to individuals j, in some manner estimating the chance that each is a malfeasor. Societal resources for secondary security screening are then concentrated against individuals with the largest priors. We may call this “strong profiling” if the concentration is at least proportional to p_j for the largest values of p_j. Secondary screening may take the form of airport luggage search, police investigation, physical search, or other societally sanctioned but personally intrusive actions.

In general, police strategies that use such priors are termed actuarial methods (4). Racial profiling, as commonly defined (5), is one such actuarial method. It occurs when an individual's prior is explicitly conditioned on his or her race, ethnicity, nationality, or religion. What distinguishes racial profiling, and actuarial methods generally, from investigational methods often perceived as more acceptable is that the prior probabilities are associated with the individual a priori, and not associated with evidence of any actual criminal conduct.

This article looks at the first-order efficiency of profiling methods: How much screening must we do, on average, to catch a malfeasor. There are also second-order effects, not addressed here. Groups may change their behavior in response to being profiled (or not). Indeed, it is a matter of debate as to whether second-order effects are net positive or negative (4, 6, 7), because nonprofiled groups may (under lower scrutiny) increase their antisocial behaviors, even as such behaviors by profiled groups may decrease. Given such second-order ambiguities, elucidation of the first-order problem seems useful, especially because (as we will see) it has some nonobvious features.

Authoritarian vs. Democratic Strategies

For simplicity, assume that there is only a single malfeasor j = j_*. (Below, we will indicate why this assumption is not actually necessary to what follows.) An omnipotent authoritarian government can enumerate all of its citizens j, and then screen each in turn, that is, by sampling without replacement. If the government knows nothing else about its citizens, then it must simply screen all in an arbitrary order until it finds the malfeasor j_*. On average, this will occur after N/2 samples.

But what if the government can assign a meaningful prior probability p_j to each individual j = 1,…, N? Then the optimal strategy is to sort the p_j's from largest to smallest value, and then screen individuals in the population, visiting each just once in decreasing order of their probability. This “authoritarian” strategy can easily be seen to find the malfeasor with the smallest possible average number of tests, because any other screening order can be improved by the pairwise exchange of any 2 out-of-order individuals. The smallest possible average number of tests is thus

where p_(i) is the order statistic; that is, p_(i) is the ith largest value among the p_j's.

For moral or practical reasons, democratic governments employ strategies not requiring the enumeration of all individuals and the availability of their individual dossiers at every checkpoint. Thus, the only “democratic” strategies available involve sampling with replacement: Individuals may be sampled with some individualized profile sampling probability q_j determined by a public policy. But, the sampling process is memory-less in that an individual is liable to be sampled more than once, according to his profile—for example, whenever he goes through an airport security checkpoint.

Square-Root Biased Sampling.

In the democratic case, the probability of not finding the malfeasor on exactly m ≥ 0 looks, and finding him on the m + 1st is (1 − q_j*)^mq_j*. So the mean number of looks required is

(an answer that we could have written down by inspection). We can take the expectation of this over the remaining random variable, namely, which value j is j_*. This expectation, which we want to minimize subject to Σq _i = 1, is thus

A straightforward minimization with a Lagrange multiplier gives the optimal choice for the q_j's

and the mean number of tests per found malfeasor,

graphic file with name zpq00609-6749-m05.jpg

In words, Eq. 4 says that individuals should be selected for screening in proportion to the square root of their prior probability. This does use the priors, but only weakly; it results in secondary screening being distributed over a much larger segment of the population than would be the case with strong profiling. Although Eq. 4 should be a well-known result, we are not aware of any published reference earlier than Abagyan and collaborators in a completely different context (8, 9).

Comparison with Naive Sampling Strategies.

It is informative to compare Eq. 5, the optimal result, with the corresponding results for 2 naive (but still democratic) sampling strategies. First, uniform sampling with replacement (ignoring the p_i's):

Second, sampling in proportion to p_i [what would be called importance sampling in the context of Monte Carlo integration (10)]. This seems like a natural way to sample likely malfeasors more heavily, and is an example of what we have termed “strong profiling.” However, as long as no p_i's are exactly zero, it gives

exactly the same as uniform sampling, Eq. 6. The reason that this strong profiling strategy is inefficient is that, on average, it keeps retesting the same innocent individuals who happen to have large p_j values. The optimal strategy is optimal precisely because it avoids this oversampling.

Efficiency of Democratic Strategy for Model Distributions

A figure of merit for the optimal democratic sampling, Eq. 4, is its efficiency with respect to the best authoritarian strategy, Eq. 1. Define the efficiency as ε = μ_A/μ_D, a value between 0 (ineffectual) and 1 (as good as best authoritarian). To get a sense of how square root biased sampling performs, we can compute ε for various assumptions about the distribution of p_j's. For example, if the prior probability is concentrated uniformly in some number of individuals N₀ (out of N), and negligible (but not strictly zero) in the remaining N − N₀, then ε ≈ 1/2, independent of N₀. That is, the optimal democratic sampling is just a factor of 2 less efficient than the authoritarian strategy; this is entirely due to its repeated sampling of some individuals. In this case, both uniform sampling and importance sampling (above) would have much smaller efficiencies, ε ≈ (1/2)N₀/N.

Another interesting case is the “scale-free” power-law distribution p_j ∝ 1/j^α. This yields (see Appendix)

where ζ is the Riemann Zeta function. For α ≈ 2, both cases are approximately |α − 2|/4. For large α, ε ≈ 1. In all of these cases, for any fixed value α not near 2, the democratic strategy is within a constant efficiency factor of the authoritarian strategy. The singular case α → 2 gives a result approaching zero with large N, ε → 1/log N, which is small only logarithmically.

The red curves in Fig. 1 show these behaviors, both for the asymptotic case of N → ∞ (Eq. 8), and for a finite case with N = 10,000 (that is, 1/ln(N) ≈ 0.11, solid curve). Shown in green are the corresponding efficiencies for the democratic, but suboptimal, strategies of uniform sampling or importance sampling (i.e., strong profiling), the identical results of Eqs. 6 and 7. Although the efficiency is finite for finite N = 10,000 (solid green curve), as N → ∞ it goes to zero except at α = 0 (dotted green curve).

A third test case of interest is p_j ∝ exp(−γj^1/β). This occurs in cases where the j's are ordered by radius from the origin in a (say) high-dimensional space, and the probability decreases away from the origin either exponentially or as a multivariate normal distribution. It also applies to a mixture of such distributions, and thus to Gaussian mixture models, in general. In all such cases β is related to (and increases with) the dimension of the space. One readily calculates (see Appendix),

with the limiting cases ≈1/2 as β → 0 and ≈(1/2)(πβ)^−1/2 as β becomes large, a surprisingly modest increase for what might have been thought to be a dimensional explosion of volume.

Case of Probabilistic Recognition of Malfeasor

What if we can not always recognize the malfeasor j_*, even when we sample him? This could be because some additional random condition (not within our control) is required for recognition. Suppose that, on each look, the probability that we recognize the malfeasor is s_i, i = 1,…, N; and that (for simplicity) each look is independently random.

Best Authoritarian Strategy.

The authoritarian strategy that led to Eq. 1 is now no longer valid, because it looked at each enumerated person only once. Instead, if we have looked at person i already m_i times, then its probability of both being the malfeasor and escaping previous detection is (1 − s_i)^m_ip_i. So the total remaining probability in which the malfeasor is to be found is

Now suppose we look next at person j. Then the change in Eq. 10 is

Thus, the greedy strategy, which can easily be seen to be also the optimal strategy, is to visit the j's according to the order statistic of the 2-dimensional lattice u_m,j, with j = 1,…, N and integer m ≥ 0. Denoting that order statistic by u_(i), we have

because one easily checks that

Best Democratic Strategy.

We derive the best democratic strategy as before. The mean number of looks to success is (s_i*q_i*)⁻¹, so we want to minimize

Now the same calculation as before gives,

and

The optimal sampling, still square-root biased as before, is seen to expend relatively more samples on the less-likely-to-recognize cases (smaller values s_i). It is the opposite of the proverbial “looking under the lamppost.” In rough terms, if you do not spend quite a lot of time “not under the lamppost,” then you provide excessive sanctuary for the malfeasor who might be there.

We have calculated the efficiency of the best democratic strategy, now ε = μ_A′/μ_D′, for the same kinds of test distributions for the p_j's as was done above, with various assumptions about the s_j's (for example, constant values 0 < s ≤ 1). Eq. 16 is evaluated straightforwardly, whereas Eq. 12 benefits from the use of a heap data structure to iterate efficiently over the (m, j) lattice. Specifically, because u_{m, j} decreases monotonically with m, we can store the next-to-use value for each j on the heap, and then efficiently retrieve the largest one (and store the next). In all cases tried, the efficiency ε increases monotonically as s (or any of the s_j's) decreases from the original case of perfect recognition, s_j = 1. Intuitively, the authoritarian advantage of being able to sample without replacement becomes less important when the malfeasor is more difficult to recognize. The blue curves in Fig. 1 show the efficiency ε in the case of a power-law distribution for the p_j's with N = 10,000, for 2 assumed values of s_j = s (0.25 and 0.05), computed as described.

Discussion

None of the results in the article actually depend on our original assumption of a single malfeasor. To see this, note that all results apply separately to each of multiple malfeasors as if he were the only one. Because the identical sampling prescription is obtained in each case, it is optimal for minimizing the mean number of tests for each malfeasor. That is, there is no better strategy for any malfeasor.

The idea of sampling by square-root probabilities is quite general and can have many other applications. It applies whenever a “bell-ringer” event must be found by sampling with replacement, but can be recognized when seen. For example, one can thus sample paths through a trellis or hidden Markov model when their number is too large to enumerate explicitly, but one path can be recognized (e.g., by secondary testing) as the desired bell ringer. It seems peculiar that the method is not better known.

Acknowledgments.

I thank Martha Minow, Nozer Singpurwalla, Sallie Keller-McNulty, Richard Garwin, and an anonymous referee for helpful comments and discussion.

Appendix

We here calculate the approximations for ε in Eqs. 8 and 9. Because ε is invariant under scaling all of the p_i's by a constant, we may here use nonnormalized p_i's. For 0 ≤ α < 2, we approximate the sums by integrals, which is accurate for large N,

graphic file with name zpq00609-6749-m18.jpg

yielding one case of Eq. 8. For α > 2, the sums are now dominated by small values of i, so we can approximate by extending the sums to infinity. If ζ( ) is the Riemann Zeta function, we have p_(i) ≈ i^−α/ζ(α) and

which implies the second case of Eq. 8.

To obtain Eq. 9, we approximate the sums by integrals,

graphic file with name zpq00609-6749-m22.jpg

Footnotes

The author declares no conflict of interest.

^†

Also see Siggins P, Racial profiling in an age of terrorism. Presentation at Markkula Center for Applied Ethics, March 20, 2002, Santa Clara, CA. Available at http://www.scu.edu/ethics/publications/ethicalperspectives/profiling.html.

References

1.Ellmann SJ. Racial profiling and terrorism. New York Law School Law Rev. 2003;46:675–730. [Google Scholar]
2.Lund N. The conservative case against racial profiling in the war on terrorism. Albany Law Rev. 2003;66:329–342. [Google Scholar]
3.London H. Profiling as needed. Albany Law Rev. 2003;66:343–347. [Google Scholar]
4.Harcourt BE. Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age. Chicago: Univ of Chicago Press; 2007. [Google Scholar]
5.American Civil Liberties Union. [Accessed January 12, 2009];Campaign Against Racial Profiling. 2008 Available at http://www.aclu.org.
6.Knowles J, Persico N, Todd P. Racial bias in motor vehicle searches: Theory and evidence. J Political Econ. 2001;109:203–229. [Google Scholar]
7.Dominitz J, Knowles J. Crime minimization and racial bias: What can we learn from police search data? Econ J. 2006;116:F368–F384. [Google Scholar]
8.Abagyan RA, Totrov M. Ab initio folding of peptides by the optimalbias Monte Carlo minimization procedure. J Comput Phys. 1999;151:402–421. [Google Scholar]
9.Zhou Y, Abagyan R. Efficient stochastic global optimization for protein structure prediction. In: Thorpe MF, Duxbury PM, editors. Rigidity Theory and Applications. New York: Springer; 2002. [Google Scholar]
10.Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes: The Art of Scientific Computing. 3rd Ed. New York: Cambridge Univ Press; 2007. section 7.9.1. [Google Scholar]

[B1] 1.Ellmann SJ. Racial profiling and terrorism. New York Law School Law Rev. 2003;46:675–730. [Google Scholar]

[B2] 2.Lund N. The conservative case against racial profiling in the war on terrorism. Albany Law Rev. 2003;66:329–342. [Google Scholar]

[B3] 3.London H. Profiling as needed. Albany Law Rev. 2003;66:343–347. [Google Scholar]

[B4] 4.Harcourt BE. Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age. Chicago: Univ of Chicago Press; 2007. [Google Scholar]

[B5] 5.American Civil Liberties Union. [Accessed January 12, 2009];Campaign Against Racial Profiling. 2008 Available at http://www.aclu.org.

[B6] 6.Knowles J, Persico N, Todd P. Racial bias in motor vehicle searches: Theory and evidence. J Political Econ. 2001;109:203–229. [Google Scholar]

[B7] 7.Dominitz J, Knowles J. Crime minimization and racial bias: What can we learn from police search data? Econ J. 2006;116:F368–F384. [Google Scholar]

[B8] 8.Abagyan RA, Totrov M. Ab initio folding of peptides by the optimalbias Monte Carlo minimization procedure. J Comput Phys. 1999;151:402–421. [Google Scholar]

[B9] 9.Zhou Y, Abagyan R. Efficient stochastic global optimization for protein structure prediction. In: Thorpe MF, Duxbury PM, editors. Rigidity Theory and Applications. New York: Springer; 2002. [Google Scholar]

[B10] 10.Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes: The Art of Scientific Computing. 3rd Ed. New York: Cambridge Univ Press; 2007. section 7.9.1. [Google Scholar]

PERMALINK

Strong profiling is not mathematically optimal for discovering rare malfeasors

William H Press

Series information

Abstract