Abstract
Audit correspondence studies are field experiments that test for discriminatory behavior in active markets. Researchers measure discrimination by comparing how responsive individuals (“audited units”) are to correspondences from different types of people. This paper elaborates on the tradeoffs researchers face between sending audited units only one correspondence and sending them multiple correspondences, especially when including less common identity signals in the correspondences. We argue that when researchers use audit correspondence studies to measure discrimination against individuals that infrequently interact with audited units, they raise the risk that these audited units become aware they are being studied or otherwise act differently. We also argue that sending multiple pieces of correspondence can increase detection risk. We present the result of an audit correspondence study that demonstrates how detection can occur for these reasons, leading to significantly attenuated (biased towards zero) estimates of discrimination.
Audit correspondence studies are a popular type of field experiment used to detect and measure discrimination. Audit studies are considered the gold standard for detecting discrimination, which otherwise may go undetected in more traditional methods, such as surveys or in-person interviews, because such behavior is often illegal or not revealed due to social desirability bias.
In audit correspondence studies, the researcher contacts, most frequently using email, individuals (e.g., politicians, landlords) or businesses to inquire about a service, seek employment, or ask a question. These individuals or businesses are the “audited unit.” In these correspondences, the researcher randomly assigns signals, such as names or other self-disclosed characteristics, that signal a specific identity (e.g., race, ethnicity, gender) or attribute (e.g., education level, occupation) of the inquirer. Suppose the audited units respond less to a correspondence containing one type of signal (e.g., African American) compared to correspondences containing a different signal (e.g., white). In that case, this provides evidence that the audited units are discriminating against individuals possessing this signal. In a famous example, economists Bertrand and Mullainathan (2004) sent resumes containing stereotypical white names (e.g., Emily, Greg) and stereotypical black names (e.g., Lakisha, Jamal) to job openings. They found that resumes containing white names received 50% more callbacks for interviews.
In recent years, there has been a proliferation in the use of audit studies to study discrimination in new and emerging markets (e.g., the “Sharing Economy”), discrimination against smaller minority groups (Weichselbaumer 2015), and based on multiple treatments such as intersectional identities (e.g., Edelman et al. 2017).
In these new contexts, the proper design of audit correspondence studies is even more critical. Sending correspondences that (1) signal that the inquirer belongs to a community that are infrequent customers, constituents, or clients of the audited unit (e.g., same-gender married couples), or (2) contain specific attributes that are unlikely to be disclosed (e.g., unsolicited information) increases the risk of detection by audited units.1 Detection is also more likely to occur when researchers send multiple pieces of correspondence, audited units do not receive correspondence often, the correspondence pieces are similar to each other, or they are sent with little time delay between them. The increased risk of detection must be contrasted with the benefits of sending more pieces of correspondence, i.e., increased statistical power and/or contexts where data collection costs are high. However, it can be difficult for researchers to assess the extent of detection risk.
Regardless of its source or cause, detection leads to biased estimates of discrimination: audited units may change their behavior if they think they are being monitored (i.e., an observer effect) or think the correspondence they get is unusual.
To show how exposing audited units to multiple infrequent identity signals and multiple correspondence pieces raises detection risk, we present the results of an email-based audit correspondence study of sexual orientation discrimination by mortgage loan originators (MLOs) in the United States. We find evidence that being exposed to more than one infrequent identity signal and multiple (three or more) pieces of correspondence significantly increase the likelihood of detection. This evidence stems from our discrimination estimates differing significantly between our “full sample,” where MLOs received four emails total in a month (two from same-gender couples), and “restricted samples,” where the MLOs were not exposed to multiple infrequent identities (i.e., they received an email from only one same-gender couple) or more than two pieces of correspondence.
While we do not know for certain the extent each MLO detected the experiment, the drastic changes in our discrimination estimates across samples with and without multiple exposures to infrequent identities strongly suggest that detection occurred. Our discrimination estimates in our restricted samples are two to three times greater relative to the full sample. Thus, the attenuation bias in our case study, likely due to detection, was severe.
Our finding that detection appears to have occurred and led to significant bias is critical for researchers to consider when designing correspondence studies. Deciding how many pieces of correspondence to send requires researchers to weigh the benefits of increased statistical power (and the ability to make within-unit comparisons) with increased detection risk (and thus bias). While detection risk is often unclear, we argue that when researchers send two or more pieces of correspondence from groups that audited units infrequently interact with or are otherwise atypical, they significantly increase the risk of detection and thus bias. We further posit that detection risk also exists when researchers send more than two pieces of correspondence, especially in a short period and when the correspondence is otherwise similar.
How Many Pieces of Correspondence Should Researchers Send?
Researchers face tradeoffs in how many pieces of correspondence they send to each audited unit. There are several benefits to sending more correspondences, including increased statistical power, reduced data collection costs, and the ability to make within-audited-unit comparisons. However, sending more than one correspondence increases the risks of detection or spillovers, both of which could bias estimates of discrimination and invalidate the study. It also imposes additional time costs or other costs onto audited units. Below, we summarize these costs and benefits and existing research on them.2 This paper’s contribution is to illuminate the risks and costs of detection, a concern for which little is known.
Statistical Power
The primary benefit of sending more pieces of correspondence is that it can significantly increase statistical power. Sufficient statistical power is required to detect meaningful levels of discrimination. Sending, for example, two correspondences per audited unit rather than one will double the sample size (although not quite double power – see Lahey and Beasley, 2018).
Including multiple correspondences per audited sample can be critical in certain study contexts where there is a finite sample of audited units (Vuolo, Uggen and Lageson 2018) or experiments that include multiple groups or treatment arms. The benefits of sending additional pieces of correspondence are higher when correspondence to the same audited unit tend to be treated similarly (i.e., there is sufficient “concordance,” see Vuolo, Uggen, and Lageson, 2018).
Reduced Data Collection Costs
Assuming a fixed cost of collecting information on each audited unit (e.g., finding an email), sending multiple pieces of correspondence to each audited unit often decreases data collection costs (i.e., cost per observation) by reducing the total number of audited units required in the study to measure a given level of discrimination.
Within-Audited-Unit Comparisons
Sending multiple pieces of correspondence to each audited unit allows for within-audited-unit comparisons. If audited units receive more than one correspondence, the researcher can include an audited unit fixed effect to control for important factors such as organizational culture, community demographics, local laws, and other audited unit characteristics. While most studies do not require the inclusion of these fixed effects since the correspondence sent to audited units is randomized, fixed effects do allow the researcher to control for fixed differences between audited units that may differ based on treatment status.
Spillovers Within an Audited Unit: Detection Risk and Induced Competition
Sending additional correspondence to each audited unit, however, can lead to spillovers, where pieces of correspondence within the same audited unit affect each other. Spillovers violate a required assumption for an experiment to have an unbiased estimated treatment effect: the Stable Unit Treatment Value Assumption (SUTVA). Simply put, SUTVA means that each piece of correspondence is unaffected by other pieces of correspondence. Spillovers can occur for a few reasons, some of which are relevant in the context of this study (i.e., an audit of mortgage loan originators), while others are more relevant in other contexts, i.e., labor market audits.
Detection Risk:
A common concern in audit studies is that the experiment is detected, i.e., audited units think something odd is happening, such as some enforcement action, a bot sending emails, etc. Detection is a source of spillovers and can similarly bias discrimination estimates.
Detection risks are high if the correspondences are unusual, which is why researchers typically invest significant effort in making their correspondences appear typical.3 However, even if the researcher sends normal-looking correspondence, it may be odd to receive multiple very similar correspondences (i.e., containing similar language) too close together or from members with whom the audited unit rarely interacts. Consider the audited unit in this study: mortgage loan originators (MLOs), who help customers get mortgage loans. It may be relatively infrequent, but not unusual, for an MLO to interact with same-gender married couples. However, it may be improbable for an MLO to receive multiple emails in a short period from multiple same-gender couples, and it may be unusual to receive emails from frequent customers (e.g., different-gender couples) and infrequent customers (e.g., same-gender couples) that contain the same attributes.4
The audited units could alter their behavior if they detect the experiment in some way. A priori, it is unclear how audited units would react if they knew the inquiries were fake, someone was monitoring their behavior, or they simply thought something was unusual. They could act in a way that they think is socially desirable, such as responding to all correspondences. Alternatively, they could instead see all pieces of correspondence as unusual and ignore them. Thus, changes in behavior can be thought of as statistical noise, which will attenuate estimates of discrimination (bias them towards zero). In a resume correspondence study of discrimination against lesbians, Weichselbaumer (2015) found that discrimination estimates were attenuated due to detection in a paired design (sending an application from both a lesbian and a straight woman) compared to when only one (randomized) job application was submitted.
Induced Competition:
Spillovers can also occur when the pieces of correspondence compete, which Larsen (2020) referred to as “induced competition.” This could occur in audits of rental housing and, most notably, labor market audits. Phillips (2019) finds additional spillovers in many audit correspondence studies of hiring discrimination, but these types of spillovers are not relevant to our context of mortgage loan applications.5 Larsen (2020) shows that spillovers are more common when the pool of other applications (or similar correspondence received) is smaller, which appears relevant to this study.
In the following sections, we detail an audit correspondence study that faced this detection issue. The detection issues were severe in this study—the estimated level of discrimination in the full sample is approximately 50% lower than in the samples less affected by detection. Our results demonstrate that detection issues can severely undermine the internal validity of a study. Thus, it may not be advisable to send too many pieces of correspondence as the costs outweigh the benefits.
Experimental Design
To test the likelihood of detection and spillover effects from sending multiple emails signaling infrequent identities, we used data from our audit correspondence study on sexual orientation discrimination in access to mortgage loans. Specifically, we conducted an email correspondence study testing if mortgage loan originators discriminate against same-gender couples requesting assistance obtaining a mortgage loan. We initially designed our experiment as a pilot study. However, it serves the additional, unexpected purpose of being a case study in how the number and type of correspondences sent can cause detection and spillover issues.
We conducted our small-scale study with a convenience sample of 118 MLOs across all regions of the United States. To generate our sample, we collected publicly posted MLO email addresses through various websites (individual bank websites, Yellow Pages, Better Business Bureau, and LinkedIn).
In our experiment, each MLO received four emails, one week apart, in random order: one email from a same-gender male couple, one from a same-gender female couple, one from a different-gender couple with a male email sender, and one from a different-gender couple with a female email sender. We conducted our experiment between September and October 2018.
Our experiment’s general design follows similar email-correspondence studies (Ahmed and Hammarstedt 2009; Hanson et al. 2016; Schwegman 2019). In each email, following Ahmed and Hammarstedt (2009), we signaled sexual orientation by having each fictitious applicant introduce themselves as well as their wife or husband. Our Online Appendix presents the templates for our emails and the email components, and additional details about the experimental design.
Our primary measure of discrimination is whether the MLO responded to the original email. We considered a response to be one that appears to be written by a human. Thus, out-of-office or other automated responses were not counted as a response. We only consider responses sent within two weeks of the original inquiry email (following Hanson et al. 2016).
In this experiment, we consider same-gender customers to be infrequent customers. Since it is not unrealistic for an MLO to interact with a same-gender couple, a single inquiry from a same-gender couple is unlikely to raise concern. However, sending multiple somewhat similar correspondences and, most notably, multiple similar correspondences from same-gender couples, especially in a short period, could increase detection risks. Unless the MLO operates in specific urban markets with a large population of same-gender couples, it is unlikely that an MLO will receive an inquiry from multiple same-gender couples in the same month.
Statistical Estimation Methodology
Before we present how we determine how detection affects our estimates of discrimination, we first present how we estimate discrimination in our experiment. Our primary outcome is if there was a non-automated response to our inquiry. To quantify differences in response rates by couple type, we use a linear regression model of the form:
| [1] |
Where i indexes each email, MMi (FFi) is an indicator variable for same-gender male (female) spouses, and FMi is an indicator variable for different-gender spouses where the wife sent the email, with the excluded category being different-gender spouses where the husband sends the email. Controls is a vector of control variables, including randomized credit scores for the couple and fixed effects for different randomized email components and email order. We also estimate a regression like [1] but pooling all same-gender couples together rather than splitting by gender.
To test how detection could have impacted our estimates of discrimination (the regression coefficients β1 and β2), we run our regressions using the four samples of the data described in Table 1: the full sample, restricted sample #1, restricted sample #2, and restricted sample #3. Each row in Table 1 shows eight possible email orders based on sending two same-gender (“same”) and two different-gender (“different”) emails.
Table 1:
Comparison of Full and Restricted Samples
|
Notes: See the notes to Figure 1. Different/Diff. = Different-gender couple; Same = Same-gender couple; The sample size for the full sample is 469, 258 for Restricted Sample #1, 204 for Restricted Sample #2, and 118 for Restricted Sample #3.
To test if MLOs change their behavior when receiving (and after receiving) a second same-gender email, we compare the full sample to restricted sample #1. In restricted sample #1, we drop all second same-gender emails and any emails we sent after the second same-gender email (“DROPPED”). Comparing discrimination estimates between the full sample and restricted sample #1 allows us to test if discrimination estimates differ if the sample includes cases where the audited unit (MLO) received more than one piece of correspondence from an uncommon group, i.e., same-gender married couples.
We also consider restricted sample #2, where we restrict the sample further by removing any round 3 emails not already excluded from restricted sample #1, and restricted sample #3, where we restrict to only the first-round emails. Comparing discrimination estimates among restricted samples #1, #2, and #3 allows us to determine if sending additional emails, conditional on only sending one email total from a same-gender married couple, causes any detection issues compared to just sending up to two emails.6
Results
Before presenting our regression results for how discrimination estimates varied by our sample, we first present response rates by round. Figure 1 presents the response rate to emails from round 1 (the first email the MLO received) to round 4 (the fourth email the MLO received, four weeks later). Figure 1 shows that many MLOs changed their behavior between rounds and did so differently if the email was from a same-gender or different-gender couple. The MLOs’ response rate to different-gender couples was relatively constant in the first three rounds: 70.8%, 73.7%, and 72.1%. However, for same-gender couples, the response rate gradually increased from 42.9% to 50.8% to 60.4%. In round 4, response rates changed dramatically. The response rate for same-gender couples dropped to 42.9%. The response rate for different-gender couples dropped far below this, to 26.2%, the lowest response rate for either group for any round. Thus, discrimination appears to decrease rapidly by round, to the point that same-gender couples appear preferred in round 4.
Figure 1:
Response Rates by Email Round and Sexual Orientation, Full Sample
Notes: Each round was about one week apart. Round 1 was at 1 pm CST on November 18, 2018; round 2 was at 2 pm CST on November 26, 2018; round 3 was at 11 am CST on December 2, 2018; round 4 was at 9 am CST on December 10, 2018. We present 95% confidence intervals for each mean response rate. Couple type within the same round is significantly different from each other at 1-percent level (***), 5-percent level (**), or 10-percent level (*), using a two-sided Fisher’s Exact Test.
Table 2 presents our regression results using two regressions, one with same-gender female and male couples combined in one indicator variable (odd columns) and one with same-gender female and male couples analyzed separately (even columns). Table 3 presents p-values for tests of equality between coefficients across samples
Table 2:
Regression Estimates for MLO Response, by Type of Married Couple
| (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | |
|---|---|---|---|---|---|---|---|---|
| Same-Gender (both) | −0.114*** (0.042) | … | −0.156 (0.124) | … | −0.272*** (0.067) | … | −0.188* (0.103) | … |
| Same-Gender Male | … | −0.086 (0.070) | … | −0.438*** (0.157) | … | −0.312*** (0.096) | … | −0.254* (0.140) |
| Same-Gender Female | … | −0.111 (0.074) | … | −0.018 (0.151) | … | −0.291** (0.115) | … | −0.335* (0.179) |
| Different-Gender, Female Emailer | … | 0.029 (0.086) | … | −0.159 (0.114) | … | −0.061 (0.124) | … | −0.189 (0.214) |
|
| ||||||||
| Sample N | Full Sample 469 | Restricted #1 258 | Restricted #2 204 | Restricted #3 118 | ||||
Notes: The response rate in the full sample for the excluded group in column (2) (different gender, male email sender) is 59.1%. We cluster our standard errors by MLO.
Significantly different from zero at 1-percent level,
5-percent level,
or 10-percent level.
Table 3 presents the p-values for tests of differences between these coefficients across samples. All regressions include fixed effects for the round and for the email template, indicator variables for if either spouse has each occupation, fixed effects for occupational tenure, and fixed effects for the number of children or if the couple is expecting a child. See the Online Appendix for more details on the experimental design and these control variables.
Table 3:
P-Values for Tests for Differences in Same-Gender Coefficients Across Samples
| Sample | Full Sample | Restricted Sample #1 | Restricted Sample #2 |
|---|---|---|---|
| Res. #1 | 0.016** (0.009***) [0.432] | … | … |
| Res. #2 | 0.003*** (0.005***) [0.049**] | 0.0924* (0.340) [0.024**] | … |
| Res. #3 | 0.390 (0.167) [0.134] | 0.810 (0.243) [0.074*] | 0.244 (0.571) [0.744] |
Notes: These p-values are from tests of differences across samples in Table 2 between same gender (both) coefficients, same-gender male coefficients (in parentheses), and same-gender female coefficients [in brackets].
Significantly different from each other at 1-percent level
5-percent level
or 10-percent level.
Starting with the full sample and Table 2, Column 1, we find that same-gender couples were 11.4 percentage points less likely to receive a response compared to different-gender couples (significant at the 1% level). In Column 2, we separate the same-gender indicator into separate indicator variables for same-gender male couples and same-gender female couples. While we find large negative coefficients, these differences are not statistically significantly different from zero.
When we restrict the sample so that MLOs have only received one same-gender email (Restricted Sample #1, Columns 3 and 4), we generally find a larger estimate of discrimination, although the estimates are noisy. The estimated level of discrimination faced by same-gender couples (Column 3) slightly increased from 11.4 to 15.6 percentage points. However, this estimate has a large standard error, and it is therefore not statistically significant. We do, however, find that the difference between these two coefficients is statistically significant at the 5% level (Table 3). Male same-gender couples are 43.8 percentage points (significantly at the 1% level) less likely to receive a response compared to different-gender couples with a male emailer. This change in coefficients is significant at the 1% level (Table 3). For same-gender female couples, the response rate in Restricted Sample #1 is the same as for different-gender couples, and there is no difference in the coefficient compared to in the full sample.
We then further restrict our sample to Restricted Sample #2 (Columns 5 and 6), the results show even more discrimination against women and same-gender couples. Column 5 shows that the response rate was 27.2 percentage points lower for same-gender couples compared to different-gender couples (significant at the 1% level), with discrimination estimates being similar by gender (Column 6). As shown in Table 3, the Restricted Sample #2 coefficients are statistically significantly different from those in both Restricted Sample #1 and the full sample.
Finally, for Restricted Sample #3 (Columns 7 and 8), the estimates are much noisier due to the very small sample size. They are similar to the estimates using Restricted Sample #2. Table 3 shows that the coefficients from Restricted Sample #3 do not differ from Restricted Sample #2.
Discussion and Conclusion
The results from our audit correspondence study show that some MLOs may have become aware of our study and changed their behavior for two reasons. First, after receiving more than one email from infrequent customers (i.e., same-gender couples), we find that MLOs discriminate less. This can be seen when comparing discrimination estimates between the full sample and Restricted Sample #1, which only includes cases where the MLO was exposed to one same-gender email. We hypothesize that the decrease in measured discrimination between these two samples is from MLOs responding to same-gender couples more often (social desirability bias) and, by email round 4, responding less often to everyone (causing additional attenuation bias). Both lead us to underestimate discrimination.
Second, there is also evidence that receiving multiple similar correspondences in a short period of time, even if only one comes from an infrequent customer, also led to attenuation bias. This can be seen by comparing estimates between Restricted Sample #1 and #2, where #2 drops additional different-gender emails sent to the MLO. These discrimination estimates generally increase when restricting the sample further, suggesting that these additional emails cause detection (and thus attenuation bias). Overall, once we remove observations likely tainted by detection (we use Restricted Sample #2 or #3), our estimates of discrimination increase by around two to three times.
Our results suggest that researchers should reduce an audited unit’s exposure to multiple infrequent signals to minimize the detection risks and resulting spillovers and bias. We suggest sending either one piece of correspondence, randomized between treatment versus control, or sending one treatment and one control piece of correspondence.
Our results also suggest that researchers should be careful when sending multiple pieces of correspondence in general, particularly beyond two pieces. Detection risk could be high in situations such as ours, where audited units do not receive correspondence all that often, the correspondence is a bit too similar, and the correspondence is not sent with long enough of a time delay. While it may be tempting to send additional pieces of correspondence to each audited unit to increase the sample size and improve scarce statistical power, the risk of bias from detection or spillovers could be very high. Sending three or four correspondence pieces could perhaps avoid these high costs of detection or spillovers only if all the following apply: the audited units frequently get similar correspondence, the correspondence is distinct from each other, there are sufficient time delays between correspondence, and the spillover concerns detailed in Phillips (2019), where correspondence such as job applications compete with each other, do not apply.
Supplementary Material
Acknowledgments
We thank seminar participants at Tulane University and the APPAM, AEA, and AREUEA conferences. We also thank Lei Gao, Andrew Hanson, Joanna Lahey, and Mike Martell for helpful comments, guidance, and discussions. We thank Eva Dils and Ilan Gressel for their excellent research assistance. This study was approved by Syracuse University’s IRB (#18-176.)
Footnotes
Detection is a form of “spillovers,” where audited units do not treat the different correspondences they receive independently. Spillovers can occur for a few reasons, which we detail in the next section, but detection is the only source of spillovers in our case study.
We point the reader towards Larsen (2020) for a more detailed discussion of the benefits and costs.
If a researcher determines that they need to send multiple correspondences, they need to decide how much to vary the content of each correspondence and the time between correspondences (order should be randomized). While the researcher should not vary the correspondences in a way that introduces unintended signals into the correspondences, randomized differences in correspondence templates, phrasing, etc. do not bias discrimination estimates but large differences could reduce statistical power.
Weichselbaumer (2015) shows that detection, and thus attenuation bias, can even occur in the case of getting one job applicant from the frequent group and one from the infrequent group.
Phillips (2019) discusses that spillovers in audit correspondence studies of hiring discrimination could occur for two reasons other than detection. First, resumes could be directly compared, rather than being evaluated independently of each other. Second, businesses may decide to hire more applicants, which may occur since the applicant pool appears stronger due to the additional resumes sent by the researcher. Phillips (2019) shows that some prior resume correspondence studies that sent four resumes to each job opening, such as Bertrand and Mullainathan (2004), had spillover issues, violating SUTVA and understating discrimination estimates by about 20% on average. These sources of spillovers are not relevant to our study of mortgage loans since the decisions about lending are made for each applicant independently, so they do not directly compete.
We follow Hanson et al. (2016) by using a window of two weeks for a response. While most responses occur within a week, before the next email is sent, there may be a few cases where the MLO reads or responds to an email after having received an email from a later round.
Contributor Information
Catherine Balfe, Tulane University.
Patrick Button, Tulane University.
Mary Penn, Tulane University.
David J. Schwegman, American University
References
- Ahmed AM, and Hammarstedt M. 2009. Detecting Discrimination against Homosexuals: Evidence from a Field Experiment on the Internet. Economica 76: 588–97. [Google Scholar]
- Bertrand M. and Duflo E. 2017. Field Experiments on Discrimination. In Handbook of Economic Field Experiments, eds. by Banerjee A. and Duflo E, 309–93. New York: Elsevier. [Google Scholar]
- Bertrand M, and Mullainathan S. 2004. Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94: 991–1013. [Google Scholar]
- Button P, Dils E, Harrell B, Fumarco L, and Schwegman DJ. 2021. Gender Identity, Race, and Ethnicity Discrimination in Access to Mental Health Care: Preliminary Evidence from a Multi-Wave Audit Field Experiment. National Bureau of Economic Research Working Paper, No. 28164. [Google Scholar]
- Edelman B, Luca M, and Svirsky D. 2017. Racial Discrimination in the Sharing Economy: Evidence from a Field Experiment. American Economic Journal: Applied Economics 9: 1–22. [Google Scholar]
- Hanson A, Hawley Z, Martin H, and Liu B. 2016. Discrimination in Mortgage Lending: Evidence from a Correspondence Experiment. Journal of Urban Economics 92: 48–65. [Google Scholar]
- Lahey J. and Beasley R. 2018. Technical Aspects of Correspondence Studies. In Audit Studies: Behind the Scenes with Theory, Method, and Nuance, ed. by Gaddis SM, 81–101. New York: Springer. [Google Scholar]
- Larsen EN 2020. Induced competition in matched correspondence tests: Conceptual and methodological considerations. Research in Social Stratification and Mobility, 65: 100475. [Google Scholar]
- Phillips DC 2019. Do Comparisons of Fictional Applicants Measure Discrimination When Search Externalities Are Present? Evidence from Existing Experiments. Economic Journal 129: 2240–64. [Google Scholar]
- Schwegman D. 2019. Rental Market Discrimination Against Same-Sex Couples: Evidence From a Pairwise-Matched Email Correspondence Test. Housing Policy Debate 29: 250–72. [Google Scholar]
- Vuolo M, Uggen C, and Lageson S. 2018. To Match or Not to Match? Statistical and Substantial Consideration in Audit Design and Analysis. In Audit Studies: Behind the Scenes with Theory, Method, and Nuance, edited by Michael Gaddis S, 119–142. New York: Springer. [Google Scholar]
- Weichselbaumer D. 2015. Testing for discrimination against lesbians of different marital status: A field experiment. Industrial Relations, 54: 131–161. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

