Fighting bias with bias: How same-race endorsements reduce racial discrimination on Airbnb

Minsu Park; Chao Yu; Michael Macy

doi:10.1126/sciadv.add2315

. 2023 Feb 8;9(6):eadd2315. doi: 10.1126/sciadv.add2315

Fighting bias with bias: How same-race endorsements reduce racial discrimination on Airbnb

Minsu Park ^1,^*,^†, Chao Yu ^2,^†,^*, Michael Macy ^3,^4,^*

PMCID: PMC9908016 PMID: 36753550

Abstract

Recent studies have documented racial discrimination in online interactions, mirroring the historic bias observed offline. The sharing economy is especially vulnerable due to greater dependence on mutual trust in sharing a ride, residence, or date with a stranger. These services rely on user recommendations to build trust, but the effects of these peer evaluations on racial bias are only beginning to be explored. Using data from Airbnb, we examine in-group preference for same-race hosts as well as same-race recommendations. The unexpected result is that these two manifestations of racial bias are offsetting, not reinforcing. White guests largely overcame their racial bias in host selection when hosts were endorsed by previous white guests. Moreover, we found no evidence of racial bias in the affective enthusiasm of endorsements, which suggests that the preference for same-race endorsements is motivated by the race of the recommender, not the content of the recommendation.

White Airbnb guests favor white hosts and white-authored recommendations, but the two types of racial bias are offsetting.

INTRODUCTION

Over the past decade, the online sharing economy has flourished in several domains, including lodging, transportation, and even dating (1–3). This success is unexpected, given the risk to personal safety of sharing a home, accepting a ride, or going on a date with a complete stranger (4). Sharing economy platforms have addressed this challenge by designing user-driven reputation systems (e.g., reviews, badges, and ratings) that discourage misbehavior and encourage mutual trust (1, 4). In addition, most platforms also display users’ names and photos, thereby reducing anonymity and conveying authenticity and accountability (1).

However, recent studies have documented an unintended consequence. Names and photos also reveal sociodemographic characteristics that can enable racial discrimination in users’ choices of individuals with whom to interact (1–3, 5), thereby reinforcing racial barriers to participation in the sharing economy (5–9). These barriers are well documented. Black users wait longer for shared rides (10) and are more likely to have the driver cancel (3). Black hosts on Airbnb had 20% less demand than similar white-owned properties (11) and earn 12% less (12), while lodging requests from Black guests are 16% less likely to be accepted (2).

In response, Uber, a popular ridesharing platform, discontinued the display of racial identities of drivers and passengers when algorithmically optimizing matches (3). Although the design change was intended to combat racial bias, the strategy backfired. Although racial discrimination was avoided at the ride-request stage, it resurfaced in higher driver cancellation rates once the match was made and users’ demographic information was revealed (3). Worse still, removing racial identities at the ride-request stage may have unintentionally strengthened the normative legitimacy of racial discrimination.

Previous research shows that reputation systems can reduce discrimination in the sharing economy (5, 13, 14). A study using 1 million Airbnb requests for accommodation found that the age and gender diversity of guests increased with the number of reviews a host received (13). Another study found that a single positive review of an Airbnb guest was sufficient to eliminate host reluctance to rent to guests with names that sounded African American (14).

Other studies show that users pay more attention to and place greater trust in third-party recommendations from in-group members. For example, Racherla et al. (15) found that “perceived similarity” based on the reviewer’s photo, name, and bio-sketch increased the willingness to stay in a reviewer-recommended hotel, while inconsistent effects were reported in a similar study of digital camera recommendations (16). An experiment involving third-party racial similarity found that racially biased white participants were more responsive to criticism of racist language on Twitter when the criticism came from a user believed to be white (17). These findings are consistent with a half-century of social identity research on in-group bias that suggests greater responsiveness to same-race peer recommendations (18–20).

Growing evidence of in-group bias in third-party recommendations poses the central question that motivates our study: Does racial bias in the response to peer endorsements promote or attenuate racial bias in the selection of exchange partners in the sharing economy? The preference of whites for same-race recommendations, combined with racial bias in the strength of reviewers’ endorsements, could reinforce racial discrimination against Black providers with few white customers and even fewer white endorsements. If so, previous studies may have overstated the effects of racial discrimination against Black providers by failing to control for the confounding effects of racial bias in third-party endorsements.

To find out, we analyzed data on host selection and peer recommendations in New York City using “instant bookings” collected from Airbnb, the leading online marketplace for shared accommodations. New York City has the booking density needed to control for neighborhood racial composition, and instant bookings are needed to isolate guests’ racial preferences since they allow guests to select a host but do not permit hosts to choose a preferred guest. We used the Face++ facial recognition algorithm to classify users’ race based on their profile photos (www.faceplusplus.com/). Visually revealed racial identities may differ from the user’s self-identification, but Airbnb does not include race in user profiles and it is the racial identity perceived by guests that influences their behavior (see Materials and Methods for details on the data collection, facial recognition, and measurement of racial bias).

Our analysis focused on two patterns of possible racial bias among white users: guest preference for same-race hosts and guest responsiveness to same-race recommendations. We then tested whether racial bias in peer recommendations reinforces or attenuates racial bias in the selection of a host. We measured racial bias in host selection by comparing the probability that a white guest will select an Airbnb host, broken down by the race of the host. We measured host preferences among Black and Asian guests as a benchmark for the assessment of racial bias among white users. As an additional benchmark, we report the probability to choose a host whose race is unidentified (i.e., the host does not have a profile photo or their photo could not be algorithmically labeled by Face++).

Our focus on racial discrimination among white guests is not because they exhibit more pronounced same-race preferences compared to Black or Asian guests. On the contrary, all three racial groups share a similar preference for same-race hosts. We focus on white discrimination because of racial inequality in access to the sharing economy in general and the underrepresentation of disadvantaged minorities on Airbnb in particular. Same-race preferences are important for all three groups but for very different reasons and with very different policy implications. In-group bias among white guests reinforces racial inequality, while in-group bias among Black and Asian guests promotes more equal opportunity for hosts from underrepresented groups.

RESULTS

Preference for same-race hosts

In Fig. 1, the x axis represents the host’s race and the y axis reports the probability of choosing the host relative to the expected probability if the choice were random. The expected probability corrects for the overrepresentation of white users on Airbnb (see Materials and Methods for additional details). The x axis also includes hosts that are racially unidentified, as a comparative baseline. The probability of choosing a racially unidentified host is close to the expected probability for each of the three guest groups, which confirms the randomization procedure used to account for the effects of the racial composition of hosts on Airbnb.

The results show that white guests are nearly three percentage points more likely to choose a white host (M = 0.027, P < 0.001) than would be expected by chance (given the racial distribution of hosts in the Airbnb population). White guests are also below chance in their likelihood to stay with Black (M = −0.046, P < 0.001) and Asian hosts (M = −0.016, P < 0.001).

The same-race preference is not limited to white guests. Asian (M = 0.030, P < 0.001) and Black (M = 0.052, P < 0.001) guests are also more likely to choose same-race hosts. Although white and Asian guests have roughly equal same-group preferences, white guests are more reluctant to stay with Black hosts than are Asian guests (D = −0.041, P < 0.001).

Same-race preference for a host does not necessarily reflect racial bias among white guests. An alternative explanation is racial inequality in the distribution of wealth such that white hosts can afford more premium amenities and better locations. An obvious problem with this explanation is that a racial disparity in property attributes cannot explain same-race preferences among Black and Asian guests. Nevertheless, we confirmed the robustness of the results reported in Fig. 1 controlling for a set of measures for the intrinsic appeal of the property, including price, amenities, privacy, and neighborhood racial composition (see tables S3, S5, and S7). We found little between-race difference in listing attributes and little change in the preference for same-race hosts when controlling for intrinsic appeal.

Preference for same-race endorsements

The results in Fig. 1 are consistent with previous studies on racial bias in the selection of exchange partners in the sharing economy. Same-race preferences in host selection could reflect racial bias in how guests write and respond to endorsements, but this possibility has received comparatively little attention in previous research. It is the problem to which we now turn.

Racial bias could affect how endorsements are written as well as how readers respond. We used VADER (Valence Aware Dictionary for Sentiment Reasoning) (21) to compare the level of enthusiasm between endorsements from same- and other-race guests (see the “Sentiment analysis” section in Materials and Methods for additional details). The sentiment score can range from −1 (most negative) to 1 (most positive), with scores between −0.05 and 0.05 considered to be neutral. The mean sentiment scores for endorsements written by same-race (0.968) and other-race guests (0.970) were close to 1.0 (the upper limit) and nearly identical (P = 0.305), indicating that there were almost no negative reviews and other-race endorsements were no less positive than endorsements for hosts of the same race.

Using the Linguistic Inquiry and Word Count (LIWC), we also tested six additional affective measures: certitude, moralization, politeness, prosocial, risk, and dissatisfaction (see fig. S7). For Black hosts, all three groups of guests used polite words more frequently than they did for other hosts, Black and Asian guests used prosocial language more often, Black guests referred to risk more often, and white guests expressed lower certitude. For white hosts, Asian guests expressed higher certitude. For Asian hosts, there were no statistically significant differences in content compared to other hosts. The magnitude of these differences was very small and only marginally significant, and there were no significant differences for any of the other three LIWC measures that we tested. In short, using eight affective measures in two lexical sentiment analysis tools, we found no evidence of higher enthusiasm in same-race endorsements. If there is racial bias in peer recommendations, it is mainly in the response to endorsements, not in their content.

We isolated guest responses to endorsements from responses to the host’s race by using bookings with racially unidentified hosts. Airbnb displays up to six endorsements on the listing’s front page, without the need for the guest to click through to see earlier reviews (see Materials and Methods for details on the measurement of guest responses). In Fig. 2A, the x axis reports the number of front-page endorsements from previous guests of the same race as the guest, among hosts with six front-page endorsements. Note that, in all cases, the hosts had six endorsements on their front pages; all that varies is the number of endorsements written by a previous guest of the same race as the current guest. We pooled five or more same-race endorsements together due to the smaller number of same-race reviews by Black guests (given the underrepresentation of Black users on Airbnb). The y axis is the probability of choosing a racially unidentified host relative to the expected probability if the choice were random, broken down by the race of the guest (identical to the measure used in Fig. 1 except that the host is racially unidentified).

Fig. 2. — The x axis is the number of same-race endorsements for a racially unidentified host (A) and a randomly chosen host (B), out of five or more front-page reviews. The y axis is the probability that a white (top), an Asian (middle), or a Black guest (bottom) chooses a host, compared to chance. In (A), as same-race endorsement increases from zero to five or more, the normed probability of booking with a racially unidentified host (relative to chance) increases from −0.240 to 0.056 for white guests, from −0.025 to 0.189 for Asian guests, and from −0.033 to 0.296 for Black guests (P < 0.001 for all three groups). In (B), the probability of booking with a randomly chosen host of known race, with equal weighting for hosts regardless of race, shows almost identical results. As the number of same-race endorsements increases from zero to five or more, the normed probability increases from −0.195 to 0.045 points for white guests, from −0.024 to 0.157 points for Asian guests, and from −0.025 to 0.248 points for Black guests (P < 0.001 for all three groups).

The results show that the impact of same-race endorsement is greater than the impact of same-race host. The probability that a white guest chooses the host increases markedly with the proportion of endorsements that were from previous guests who were also white, from −0.240 to 0.056 points (P < 0.001) as the number of same-race endorsements increases from zero to five or more.

As with the same-race preference for hosts, the effect of same-race endorsements is not limited to white guests. The booking probability increases from −0.033 to 0.296 points (P < 0.001) for Black guests and from −0.025 to 0.189 points (P < 0.001) for Asian guests. Figure 2A also shows that the overall booking probabilities for white guests are much lower, compared to Black and Asian guests. This indicates that white guests are more reluctant than other guests to book with hosts whose race is unknown.

Given that the race of the host is unknown, a plausible interpretation is that guests use the racial composition of endorsements as a proxy for the unknown race of the host. We therefore replicated Fig. 2A using bookings where the race of the host is known; hence, the effect of same-race endorsement is not needed to infer the race of the host. The results in Fig. 2B are nearly identical to Fig. 2A except that the race of the host is known to the guest. The y axis measures the probability of booking with a randomly chosen host of known race, with equal weighting for hosts regardless of race. The probability that a white guest chooses a host of known but random race increases from −0.195 to 0.045 (P < 0.001) as the number of same-race endorsements increases from zero to five or more. Same-race endorsements also increase the probability of a booking for Black (from −0.025 to 0.248, P < 0.001) and Asian guests (from −0.024 to 0.157, P < 0.001).

Racial bias in host selection (Fig. 1) and in the response to endorsements (Fig. 2) is consistent with decades of social identity research on in-group demographic preferences (18–20). Figure 3 addresses the central question that motivates this investigation: Are these two manifestations of racial bias mutually reinforcing? The results show the opposite. Figure 3 is identical to Fig. 2, except that the y axis now measures the booking probability when the race of the host is known, broken down by race. The figure reports results for hosts of the same race (orange) and different race (navy). The choices of other-race hosts were sufficiently similar that we simplified the analysis by combining both other-race groups (see fig. S1 for results with all nine combinations of the race of guest and host).

Fig. 3. — The x axis is the number of same-race endorsements of the host (out of five or more front-page reviews). The y axis is the probability that a white (top), an Asian (middle), or a Black guest (bottom) chooses a host, compared to chance. Color indicates whether the race of the host is the same as the guest. As same-race endorsement increases from zero to five or more, the normed probability of booking with the host increases, regardless of the race of the guest or host. However, among white guests, the increase in booking probability is larger for other-race hosts (D = 0.254, compared to D = 0.222), while among Black and Asian guests, the increase is larger for same-race hosts (D = 0.361 and D = 0.253), compared to other-race hosts (D = 0.269 and D = 0.158), among Black and Asian guests, respectively.

The results show that same-race endorsements markedly increase not only the probability that white guests will choose a host of the same race but also the probability that they will choose a host of a different race. Among white guests (upper panel), the probability of choosing a same-race host relative to chance increases from −0.155 to 0.067 (P < 0.001) and the probability for a host of a different race increases from −0.213 to 0.041 (P < 0.001) as the number of same-race endorsements increases from zero (all endorsements were other race) to five or more (all endorsements were the same race). The gap between the orange and navy lines diminishes as the number of same-race endorsements increases. This gap measures the difference in the effect of white endorsements, depending on the race of the host. The reduction in the size of the gap with the number of same-race endorsements indicates that the effect of white endorsements on white guests is stronger for Black and Asian hosts.

The middle (Asian guests) and lower (Black guests) panels show similar main effects of same-race endorsement but the opposite interaction with the race of the host. Among Asian guests, the booking probability with a host of the same race increases from −0.024 to 0.229 (P < 0.001) as the number of same-race endorsements increases from zero to five or more. The booking probability for a host of another race also increases from −0.027 to 0.131 (P < 0.001). Among Black guests, the booking probability increases from −0.011 to 0.350 (P < 0.001) and from −0.036 to 0.233 (P < 0.001) for a host of the same race and of another race, respectively.

Although the effect of same-race endorsements is similar for all three groups of guests, the interaction with the race of the host is very different. As the number of same-race endorsements increases, the gap between the orange and navy lines gets larger for Black and Asian guests, while it gets smaller for white guests (see fig. S2 for additional analysis). This suggests that, compared to Black and Asian guests, white guests are less affected by the race of the host as the number of same-race endorsements increases.

As with Fig. 1, we tested the robustness of the results reported in Fig. 3 controlling for a set of measures for the intrinsic appeal of the property (see tables S4, S6, and S8). Including the controls had no qualitative effect. We also replicated Fig. 3 using the total number of same-race endorsements out of all reviews, not just those on the host’s front page (see fig. S3).

DISCUSSION

Although racial discrimination is against the law in the United States (22), in-group preferences still affect many daily behaviors, such as choosing friends and neighborhoods (23–26). These in-group preferences are also evident in the sharing economy, from rides to dating (2, 3). Using data from Airbnb in New York City, we confirmed previous studies showing that racial bias exists in online lodging markets as well.

We also tested for racial bias in peer recommendations. We found no difference in the affective strength of endorsements for same- and other-race hosts. Instead, racial bias in endorsements occurs in the response that they elicit. The race of endorsers had even more impact than the race of hosts. Yet, unexpectedly, the higher responsiveness to same-race endorsements had a net positive effect on guests’ willingness to choose a host from a race other than their own.

The effect of white endorsements on the willingness of white guests to book other-race hosts is not because white guests write more positive reviews. On the contrary, nearly all reviews are positive, regardless of the race of the host or the reviewer (27, 28). Moreover, guests of all races are more responsive to same-race endorsements, which suggests that it is the race of the reviewer, and not racial differences in review content, that accounts for the same-race responsiveness.

Nevertheless, there is an important difference in the effect of same-race endorsements on the willingness to book with other-race hosts. Because the market is white dominated, a Black or Asian host is more likely to be endorsed by white guests than is a white host to be endorsed by Black or Asian guests. Thus, the net effect of same-race endorsement is to reduce racial discrimination on Airbnb.

These results have policy implications for reducing racial inequality in access to the sharing economy. Complaints about this inequality have led some companies to obscure the racial identity of users by removing profile photos (3). However, this strategy can also backfire, by shifting discrimination to the point where racial identities become apparent and by undermining trust when transactions are between anonymous users (3, 29).

Instead, our study suggests that it may be possible to algorithmically harness racial bias to combat racial inequality. Same-race booking preferences limit opportunities for Black hosts to be booked by, and thus reviewed by, guests of other races. Instead of hiding profile photos, platforms should selectively display the same racial composition of front-page endorsements for all providers, creating a level playing field in the positive effects of same-race endorsements on the willingness of white guests to book with hosts of a different race. Increasing the exposure of white guests to white-authored endorsements of Black hosts may lead to more white bookings, thereby making the algorithmic correction less necessary over time.

Our study has several limitations. First, despite the high (68%) reviewing rate (30), guests who did not leave reviews are missing from the data. This imposes a selection bias in favor of previous guests who were motivated to write reviews. Although this makes the endorsements unrepresentative of all user experiences, what matters for host selection is the racial distribution observed by the guest, not the true underlying distribution which the guest has no way to see.

Second, hosts also review guests, but we were unable to obtain these data. Future research is needed that uses host-authored endorsement of their guests to test for racial bias among hosts in the endorsement of guests and how these endorsements might affect racial diversity of reviews written by guests.

Third, we cannot rule out the possibility that our measure for same-race preference is confounded by unmeasured listing attributes that make guests more likely to book with and recommend the host. We tested the robustness of the results when controlling for privacy, amenities, price, and neighborhood racial composition, but a property can have special attributes (e.g., proximity to the beach and view of the mountains) that we cannot measure and therefore cannot control.

Fourth, we used time slices (before and after the start of 2017) to confirm the robustness of our temporally aggregated results (see the “Robustness check for temporal differences” section in the Supplementary Materials). However, our analyses were static and do not examine trends in racial bias. Changes to the Airbnb booking system and user interface may have altered the behavior and racial composition of users during the time period for which we have data (2009 to 2018). Future research could leverage these changes to open up opportunities for causal inference that go beyond the descriptive account presented here.

Fifth, the racial composition and distribution of guest-host pairings could differ markedly from one neighborhood to another. We confirmed the robustness of our results by replicating the analyses using neighborhood-specific measures (see the “Robustness check for spatial differences” section in the Supplementary Materials), but future research could go further by using neighborhood variation to investigate ecological and contextual constraints on same-race preferences.

Sixth, our data are limited to users in New York City, which may understate the bias that might be observed in less cosmopolitan locations. However, New York City afforded the population density and diversity needed for the neighborhood-specific measures used to confirm the robustness of the spatially aggregated results.

Last, we did not investigate gender differences in racial bias. Although gender distributions do not vary by race in the larger population, gender is known to strongly influence pairing probabilities in the sharing economy (31). We controlled for gender to confirm the robustness of the reported results (see the “Robustness check for intrinsic appeal” section in the Supplementary Materials), but future research should focus on the intersection of gender and race in guests’ booking behavior, reviews, and responses to endorsements.

MATERIALS AND METHODS

Dataset description

All data used in this study were obtained from Inside Airbnb (http://insideairbnb.com/), an independent third-party platform that specializes in the collection and distribution of Airbnb data. The data were collected for bookings in New York City between 12 March 2009 and 6 August 2018. We only collected listings that were “instant bookable,” i.e., a listing that a guest can book immediately without needing to send a reservation request to a host for approval. Instant bookings are needed to isolate racial bias in the selection of a host from bias by hosts in approving of a guest. The dataset contains 13,663 instant bookable listings from 11,633 hosts and 407,295 time-stamped reviews from 377,086 guests. These data represent 34.6% of total listings and 40.4% of total reviews before excluding those that allowed hosts to reject guest applications. The rates of instant bookings are nearly identical across racial groups. After removing guests who could not be racially identified, the data used for the reported analyses consisted of 150,570 time-stamped guest reviews of 7147 hosts across 7903 instant bookable listings. To confirm the robustness of our results based on instant bookings, we replicated the analyses only with requested bookings and the results are almost identical (see the “Robustness check for booking type” section in the Supplementary Materials).

Racial classification using facial recognition

Airbnb guests cannot look up a host’s racial identity in their user profile, nor can guests see the host’s last name, which previous studies have used for labeling the race of online users (31, 32). However, guests can infer the race of a host or reviewer from their profile photo. (Airbnb recommends that users upload their real profile photos and informs users that many hosts require that guests have a profile photo.) We have no way to know if the host’s race, as perceived by the guest, matches the host’s own racial identity. However, it is the perceived race of the host that should be used to measure a guest’s racial bias. For example, we regard the choice of a host as racially biased if a white guest books with an other-race host mistakenly perceived to be white. It is the guest’s perception that matters, not the host’s own racial identity.

Like the guests, we also do not know the race with which the host self-identifies, and like the guests, we used profile photos to ascertain a user’s race. The photos were labeled on Face++ (www.faceplusplus.com/), an online facial recognition service, because the number of users in our dataset was too large to use human coders. Face++ provides a reliable pretrained facial recognition algorithm that identifies demographic attributes including age and gender as well as race. Face++ classifies users into three racial categories: white, Black, and Asian. The racial distribution among the profile photos in our dataset was as follows: 59.82% of guests and 57.24% of hosts were white; 16.21% of guests and 22.01% of hosts were Black; and 23.97% of guests and 20.75% of hosts were Asian. As a reference point, the racial composition of New York City is 41.3% white, 23.8% Black, and 14.3% Asian in 2021 and 44.6% white, 25.1% Black, and 11.8% Asian in 2010, according to U.S. Census Bureau QuickFacts (www.census.gov/quickfacts/). The comparison with Census data indicates that white users are greatly overrepresented on Airbnb, while Black users are underrepresented (assuming random error in the visual classifications).

Face++ racial classification has been validated using the Chicago Face Database (82.8% accuracy) and the 10k US Adult Faces Database (72.9% accuracy) (33). However, those tests did not use Airbnb profile photos. We therefore tested for accuracy using a stratified random sample of 3000 profile photos matched with user IDs in our dataset. The photos were obtained using Airbnb URLs and were not locally stored. Each profile picture was evaluated by three independent coders recruited from Amazon Mechanical Turk. Each coder was asked to assign a perceived race with an option of “Other/None of the Above.” Each profile picture was labeled with the race identified by at least two coders (N = 2872) and otherwise labeled as ambiguous (N = 128). The average agreement between Face++ and human coders was 83.3%. Although the agreement is less than ideal, the errors appear to be random, which means that the racial bias that we report is, if anything, underestimated.

We labeled users as racially unidentified if they did not have a profile photo or their photo did not contain an algorithmically detected individual human face; 89.69% of guests and 80.97% of hosts in our data used images of people with identifiable human faces as their profile photos. Among users with algorithmically identifiable photos, 20.62% of guests and 11.79% of hosts had profile photos with more than one face, nearly all of which show the faces of a same-race couple or family. We excluded users with more than one face in their profile photos for three reasons: Multiple faces in profile photos are susceptible to greater error in algorithmic labeling due to the inability to distinguish between background and foreground faces, reduced facial resolution in the background, and more opportunities for error (e.g., if the algorithm has p probability to correctly label a single face, the probability to correctly label all n faces is pⁿ). Instead, we assume that there are no qualitative differences in how guests respond to a host or reviewer between a single individual and a couple. Photos with multiple faces suggest an opportunity for future research to test whether guest preferences for interracial hosts more closely match their preferences for same-race or other-race individual hosts.

Front-page reviews

Airbnb posts up to six of a host’s most recent reviews on the host’s front page, without the need for the guest to click through to see the others. Previous studies have shown that front-page reviews are the most influential (34, 35). However, the data we obtained do not identify which reviews are front page. We therefore used the review date to label up to six of the most recent reviews as front page. We validated the labeling method using 100 manually coded front-page reviews. Ninety-seven of 100 were correctly labeled using recency of the review as a proxy.

Our analysis of racial preference for same-race reviews is based on an equal number of reviews for all hosts, regardless of race—the six most recent reviews posted on the host’s front page. Hosts with fewer than six reviews were excluded from the analysis and we pooled five or more same-race endorsements together to ensure sufficiently large numbers for statistical reliability. All analyses in the main text used front-page reviews, but replication using all endorsements confirms the robustness of the reported results (see fig. S3).

Observed and expected host-selection probability

We measured racial bias by comparing the observed probability of host selection with the probability expected if race were randomly assigned. Random assignment was implemented by randomly shuffling the guests to produce a set of random pairings while preserving the original racial distribution. The difference between the observed and expected probabilities then captures the effects of racial preference that are over and above the distributional effects. Statistical significance of the difference was estimated using the magnitude of the difference in standard errors (the SD of the expected probability over 1000 reshuffled samples).

Sentiment analysis

The number of same-race endorsements fails to capture differences in the level of enthusiasm expressed in each endorsement and may include reviews that are critical of the host. We used VADER to measure the difference in mean review sentiment between endorsements from same- and other-race guests. VADER is a lexical sentiment analysis tool that is tuned for short texts in online social media (21). We validated the lexical measures by randomly selecting 50 negative reviews and 50 positive reviews as coded by VADER. Two human coders then independently classified the reviews as positive or negative. The intercoder reliability measured by Krippendorff’s alpha was 0.94 (P < 0.001) and the average agreement between human coders and VADER classification was 0.93, where 1.0 indicates perfect agreement. We excluded the small number of negative reviews (1.44%) and labeled the remaining reviews as endorsements.

VADER’s limited number of sentiment measures led us to also measure enthusiasm using LIWC (the 2022 edition). LIWC includes 131 “certitude” words (including “really,” “actually,” and “of course”), 356 “moralization” words (e.g., “wrong,” “honor*,” “deserv*,” and “judge”) that are used to convey moral judgment, 142 “politeness” words (e.g., “please” and “thank you”), 242 “prosocial” words (e.g., “care” and “help”), 128 “risk” words (e.g., “secur*,” “protect*,” “pain,” and “risk*”), 89 “lack” words (e.g., “want,” “hope,” “wanted,” “wish,” and “didn’t have”), 56 “want” words, and, lastly, 49 “fulfill” words (e.g., “enough,” “full,” “complete,” and “extra”) that convey dissatisfaction. We binned all the reviews for each of the nine host-by-guest combinations of white, Black, and Asian users. For each bin, we counted the number of instances in which any word in the dictionary appeared in each bin and divided this number by the number of words in the bin. Each of the nine bins was analyzed using the following word lists: certitude, moralization, politeness, prosocial, risk, and dissatisfaction (combining “want,” “lack,” and “fulfill,” aligned for similar direction) and we calculated the standard errors by bootstrapping 1000 randomly drawn samples of the same size (with replacement).

Ethical considerations

This research was reviewed by the host university’s Institutional Review Board and classified as exempt because the data were publicly available (under CC BY 4.0), although the research involved the use of automated face classification (Face++). Face recognition algorithms have lower accuracy for members of underrepresented and marginalized groups that can exacerbate racial inequalities in applications that rely on face recognition as inputs for active user decision-making. However, our study only relies on face recognition to label data passively and therefore is not susceptible to the ethical problems that arise when the algorithms are used in user-driven applications.

Acknowledgments

We greatly appreciate members of Cornell’s Social Dynamics Laboratory and New York University Abu Dhabi’s Social Research and Public Policy Writing Group for their helpful suggestions.

Funding: M.M. acknowledges the U.S. National Science Foundation (SES 2049207 and SES 1756822) and the DARPA Ground Truth Program for their support during the time the research for this project was conducted.

Author contributions: M.P. and C.Y. conceived the project, collected data, and performed empirical analyses. M.M., M.P., and C.Y. designed the analyses, interpreted results, and wrote the manuscript. M.M. and M.P. edited the manuscript.

Competing interests: The authors declare that they have no competing interests.

Data and materials availability: All data and code used in the analysis are available at https://osf.io/743vd/.

Supplementary Materials

This PDF file includes:

Supplementary Text

Figs. S1 to S9

Tables S1 to S8

References

Click here for additional data file.^{(816.9KB, pdf)}

REFERENCES AND NOTES

1.K. Levy, S. Barocas, Designing against discrimination in online markets. Berkeley Technol. Law J. 32, 1183 (2017). [Google Scholar]
2.B. Edelman, M. Luca, D. Svirsky, Racial discrimination in the sharing economy: Evidence from a field experiment. Am. Econ. J. Appl. Econ. 9, 1–22 (2017). [Google Scholar]
3.J. Mejia, C. Parker, When transparency fails: Bias and financial incentives in ridesharing platforms. Manage. Sci. 67, 166–184 (2021). [Google Scholar]
4.J. Santana, P. Parigi, Risk aversion and engagement in the sharing economy. Games 6, 560–573 (2015). [Google Scholar]
5.J. D. Tjaden, C. Schwemmer, M. Khadjavi, Ride with me—Ethnic discrimination, social markets, and the sharing economy. Eur. Sociol. Rev. 34, 418–432 (2018). [Google Scholar]
6.J. Kleinberg, J. Ludwig, S. Mullainathan, C. R. Sunstein, Discrimination in the age of algorithms. J. Leg. Anal. 10, 113–174 (2018). [Google Scholar]
7.J. Kleinberg, J. Ludwig, S. Mullainathan, C. R. Sunstein, Algorithms as discrimination detectors. Proc. Natl. Acad. Sci. U.S.A. 117, 30096–30100 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Z. Obermeyer, B. Powers, C. Vogeli, S. Mullainathan, Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019). [DOI] [PubMed] [Google Scholar]
9.P. Törnberg, How sharing is the “sharing economy”? Evidence from 97 Airbnb markets. PLOS ONE 17, e0266998 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Y. Ge, C. R. Knittel, D. MacKenzie, S. Zoepf, Racial discrimination in transportation network companies. J. Public Econ. 190, 104205 (2020). [Google Scholar]
11.A. Marchenko, The impact of host race and gender on prices on Airbnb. J. Hous. Econ. 46, 101635 (2019). [Google Scholar]
12.B. G. Edelman, M. Luca, “Digital discrimination: The case of Airbnb.com,” Harvard Bus. Sch. NOM Unit Work. Pap. (no. 14-054) (2014).
13.B. Abrahao, P. Parigi, A. Gupta, K. S. Cook, Reputation offsets trust judgments based on social biases among Airbnb users. Proc. Natl. Acad. Sci. U.S.A. 114, 9848–9853 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.R. Cui, J. Li, D. J. Zhang, Reducing discrimination with reviews in the sharing economy: Evidence from field experiments on airbnb. Manage. Sci. 66, 1071–1094 (2020). [Google Scholar]
15.P. Racherla, M. Mandviwalla, D. J. Connolly, Factors affecting consumers’ trust in online product reviews. J. Consum. Behav. 11, 94–104 (2012). [Google Scholar]
16.M. Zhao, J. Xie, Effects of social and temporal distance on consumers’ responses to peer recommendations. J. Market. Res. 48, 486–496 (2011). [Google Scholar]
17.K. Munger, Tweetment effects on the tweeted: Experimentally reducing racist harassment. Polit. Behav. 39, 629–649 (2017). [Google Scholar]
18.J. C. Turner, in Rediscovering Social Identity, T. Postmes, N. R. Branscombe, Eds. (Psychology Press, 2010), pp. 243–272. [Google Scholar]
19.H. Tajfel, J. C. Turner, W. G. Austin, S. Worchel, An integrative theory of intergroup conflict. Organ. identity A Read. 56, 9780203505984--16 (1979). [Google Scholar]
20.H. Tajfel, Social identity and intergroup behaviour. Soc. Sci. Inf. 13, 65–93 (1974). [Google Scholar]
21.C. J. Hutto, E. Gilbert, Vader: A parsimonious rule-based model for sentiment analysis of social media text, in Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (vol. 8, 2014), pp. 216–225. [Google Scholar]
22.D. A. Strauss, Discriminatory intent and the taming of brown. Univ. Chicago Law Rev. 56, 935 (1989). [Google Scholar]
23.J. De Houwer, Implicit bias is behavior: A functional-cognitive perspective on implicit bias. Perspect. Psychol. Sci. 14, 835–840 (2019). [DOI] [PubMed] [Google Scholar]
24.L. Cosmides, J. Tooby, R. Kurzban, Perceptions of race. Trends Cogn. Sci. 7, 173–179 (2003). [DOI] [PubMed] [Google Scholar]
25.M. R. Banaji, A. G. Greenwald, Blindspot: Hidden Biases of Good People (Delacorte Press, 2013). [Google Scholar]
26.B. Hofstra, R. Corten, F. van Tubergen, N. B. Ellison, Sources of segregation in social networks: A novel approach using Facebook. Am. Sociol. Rev. 82, 625–656 (2017). [Google Scholar]
27.G. Zervas, D. Proserpio, J. W. Byers, A first look at online reputation on Airbnb, where every stay is above average. Mark. Lett. 32, 1–16 (2021). [Google Scholar]
28.J. Bridges, C. Vásquez, If nearly all Airbnb reviews are positive, does that make them meaningless? Curr. Issues Tour. 21, 2057–2075 (2018). [Google Scholar]
29.A. C. Morales, Giving firms an “E” for effort: Consumer responses to high-effort firms. J. Consum. Res. 31, 806–812 (2005). [Google Scholar]
30.A. Fradkin, E. Grewal, D. Holtz, Reciprocity and unveiling in two-sided reputation systems: Evidence from an experiment on Airbnb. Mark. Sci. 40, 1013–1029 (2021). [Google Scholar]
31.G. M. Castex, Providing services to hispanic/latino populations: Profiles in diversity. Soc. Work 39, 288–296 (1994). [Google Scholar]
32.N. Garg, L. Schiebinger, D. Jurafsky, J. Zou, Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl. Acad. Sci. U.S.A. 115, E3635–E3644 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.B. Jaeger, W. W. A. Sleegers, A. M. Evans, Automated classification of demographics from face images: A tutorial and validation. Soc. Personal. Psychol. Compass 14, e12520 (2020). [Google Scholar]
34.B. Pan, H. Hembrooke, T. Joachims, L. Lorigo, G. Gay, L. Granka, In Google we trust: Users’ decisions on rank, position, and relevance. J. Comput. Commun. 12, 801–823 (2007). [Google Scholar]
35.T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, G. Gay, Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Trans. Inf. Syst. 25, 7 (2007). [Google Scholar]
36.M. J. Salganik, K. E. C. Levy, Wiki surveys: Open and quantifiable social data collection. PLOS ONE 10, e0123483 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Text

Figs. S1 to S9

Tables S1 to S8

References

Click here for additional data file.^{(816.9KB, pdf)}

[R1] 1.K. Levy, S. Barocas, Designing against discrimination in online markets. Berkeley Technol. Law J. 32, 1183 (2017). [Google Scholar]

[R2] 2.B. Edelman, M. Luca, D. Svirsky, Racial discrimination in the sharing economy: Evidence from a field experiment. Am. Econ. J. Appl. Econ. 9, 1–22 (2017). [Google Scholar]

[R3] 3.J. Mejia, C. Parker, When transparency fails: Bias and financial incentives in ridesharing platforms. Manage. Sci. 67, 166–184 (2021). [Google Scholar]

[R4] 4.J. Santana, P. Parigi, Risk aversion and engagement in the sharing economy. Games 6, 560–573 (2015). [Google Scholar]

[R5] 5.J. D. Tjaden, C. Schwemmer, M. Khadjavi, Ride with me—Ethnic discrimination, social markets, and the sharing economy. Eur. Sociol. Rev. 34, 418–432 (2018). [Google Scholar]

[R6] 6.J. Kleinberg, J. Ludwig, S. Mullainathan, C. R. Sunstein, Discrimination in the age of algorithms. J. Leg. Anal. 10, 113–174 (2018). [Google Scholar]

[R7] 7.J. Kleinberg, J. Ludwig, S. Mullainathan, C. R. Sunstein, Algorithms as discrimination detectors. Proc. Natl. Acad. Sci. U.S.A. 117, 30096–30100 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Z. Obermeyer, B. Powers, C. Vogeli, S. Mullainathan, Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019). [DOI] [PubMed] [Google Scholar]

[R9] 9.P. Törnberg, How sharing is the “sharing economy”? Evidence from 97 Airbnb markets. PLOS ONE 17, e0266998 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Y. Ge, C. R. Knittel, D. MacKenzie, S. Zoepf, Racial discrimination in transportation network companies. J. Public Econ. 190, 104205 (2020). [Google Scholar]

[R11] 11.A. Marchenko, The impact of host race and gender on prices on Airbnb. J. Hous. Econ. 46, 101635 (2019). [Google Scholar]

[R12] 12.B. G. Edelman, M. Luca, “Digital discrimination: The case of Airbnb.com,” Harvard Bus. Sch. NOM Unit Work. Pap. (no. 14-054) (2014).

[R13] 13.B. Abrahao, P. Parigi, A. Gupta, K. S. Cook, Reputation offsets trust judgments based on social biases among Airbnb users. Proc. Natl. Acad. Sci. U.S.A. 114, 9848–9853 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.R. Cui, J. Li, D. J. Zhang, Reducing discrimination with reviews in the sharing economy: Evidence from field experiments on airbnb. Manage. Sci. 66, 1071–1094 (2020). [Google Scholar]

[R15] 15.P. Racherla, M. Mandviwalla, D. J. Connolly, Factors affecting consumers’ trust in online product reviews. J. Consum. Behav. 11, 94–104 (2012). [Google Scholar]

[R16] 16.M. Zhao, J. Xie, Effects of social and temporal distance on consumers’ responses to peer recommendations. J. Market. Res. 48, 486–496 (2011). [Google Scholar]

[R17] 17.K. Munger, Tweetment effects on the tweeted: Experimentally reducing racist harassment. Polit. Behav. 39, 629–649 (2017). [Google Scholar]

[R18] 18.J. C. Turner, in Rediscovering Social Identity, T. Postmes, N. R. Branscombe, Eds. (Psychology Press, 2010), pp. 243–272. [Google Scholar]

[R19] 19.H. Tajfel, J. C. Turner, W. G. Austin, S. Worchel, An integrative theory of intergroup conflict. Organ. identity A Read. 56, 9780203505984--16 (1979). [Google Scholar]

[R20] 20.H. Tajfel, Social identity and intergroup behaviour. Soc. Sci. Inf. 13, 65–93 (1974). [Google Scholar]

[R21] 21.C. J. Hutto, E. Gilbert, Vader: A parsimonious rule-based model for sentiment analysis of social media text, in Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (vol. 8, 2014), pp. 216–225. [Google Scholar]

[R22] 22.D. A. Strauss, Discriminatory intent and the taming of brown. Univ. Chicago Law Rev. 56, 935 (1989). [Google Scholar]

[R23] 23.J. De Houwer, Implicit bias is behavior: A functional-cognitive perspective on implicit bias. Perspect. Psychol. Sci. 14, 835–840 (2019). [DOI] [PubMed] [Google Scholar]

[R24] 24.L. Cosmides, J. Tooby, R. Kurzban, Perceptions of race. Trends Cogn. Sci. 7, 173–179 (2003). [DOI] [PubMed] [Google Scholar]

[R25] 25.M. R. Banaji, A. G. Greenwald, Blindspot: Hidden Biases of Good People (Delacorte Press, 2013). [Google Scholar]

[R26] 26.B. Hofstra, R. Corten, F. van Tubergen, N. B. Ellison, Sources of segregation in social networks: A novel approach using Facebook. Am. Sociol. Rev. 82, 625–656 (2017). [Google Scholar]

[R27] 27.G. Zervas, D. Proserpio, J. W. Byers, A first look at online reputation on Airbnb, where every stay is above average. Mark. Lett. 32, 1–16 (2021). [Google Scholar]

[R28] 28.J. Bridges, C. Vásquez, If nearly all Airbnb reviews are positive, does that make them meaningless? Curr. Issues Tour. 21, 2057–2075 (2018). [Google Scholar]

[R29] 29.A. C. Morales, Giving firms an “E” for effort: Consumer responses to high-effort firms. J. Consum. Res. 31, 806–812 (2005). [Google Scholar]

[R30] 30.A. Fradkin, E. Grewal, D. Holtz, Reciprocity and unveiling in two-sided reputation systems: Evidence from an experiment on Airbnb. Mark. Sci. 40, 1013–1029 (2021). [Google Scholar]

[R31] 31.G. M. Castex, Providing services to hispanic/latino populations: Profiles in diversity. Soc. Work 39, 288–296 (1994). [Google Scholar]

[R32] 32.N. Garg, L. Schiebinger, D. Jurafsky, J. Zou, Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl. Acad. Sci. U.S.A. 115, E3635–E3644 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.B. Jaeger, W. W. A. Sleegers, A. M. Evans, Automated classification of demographics from face images: A tutorial and validation. Soc. Personal. Psychol. Compass 14, e12520 (2020). [Google Scholar]

[R34] 34.B. Pan, H. Hembrooke, T. Joachims, L. Lorigo, G. Gay, L. Granka, In Google we trust: Users’ decisions on rank, position, and relevance. J. Comput. Commun. 12, 801–823 (2007). [Google Scholar]

[R35] 35.T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, G. Gay, Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Trans. Inf. Syst. 25, 7 (2007). [Google Scholar]

[R36] 36.M. J. Salganik, K. E. C. Levy, Wiki surveys: Open and quantifiable social data collection. PLOS ONE 10, e0123483 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Fighting bias with bias: How same-race endorsements reduce racial discrimination on Airbnb

Minsu Park

Chao Yu

Michael Macy

Roles

Abstract

INTRODUCTION