Reputation offsets trust judgments based on social biases among Airbnb users

Bruno Abrahao; Paolo Parigi; Alok Gupta; Karen S Cook

doi:10.1073/pnas.1604234114

. 2017 Aug 28;114(37):9848–9853. doi: 10.1073/pnas.1604234114

Reputation offsets trust judgments based on social biases among Airbnb users

Bruno Abrahao ^a,¹, Paolo Parigi ^b, Alok Gupta ^c, Karen S Cook ^a,¹

PMCID: PMC5603987 PMID: 28847948

Significance

We investigate the extent to which artificial features engineered by sharing-economy platforms, such as reputation systems, can be used to override people’s tendency to base judgments of trustworthiness on social biases, such as to trust others who are similar (i.e., homophily). To this end, we engaged 8,906 users of Airbnb as volunteers in an online experiment. We demonstrate that homophily based on several demographic characteristics is a relatively weak driver of trust. In fact, having high reputation is enough to counteract homophily. Using Airbnb data, we present evidence that the effects we found experimentally are at work in the actual platform. Lastly, we found an inverse relationship between risk aversion and trust in those with positive reputations.

Keywords: online trust, reputation systems, sharing economy, social biases, risk

Abstract

To provide social exchange on a global level, sharing-economy companies leverage interpersonal trust between their members on a scale unimaginable even a few years ago. A challenge to this mission is the presence of social biases among a large heterogeneous and independent population of users, a factor that hinders the growth of these services. We investigate whether and to what extent a sharing-economy platform can design artificially engineered features, such as reputation systems, to override people’s natural tendency to base judgments of trustworthiness on social biases. We focus on the common tendency to trust others who are similar (i.e., homophily) as a source of bias. We test this argument through an online experiment with 8,906 users of Airbnb, a leading hospitality company in the sharing economy. The experiment is based on an interpersonal investment game, in which we vary the characteristics of recipients to study trust through the interplay between homophily and reputation. Our findings show that reputation systems can significantly increase the trust between dissimilar users and that risk aversion has an inverse relationship with trust given high reputation. We also present evidence that our experimental findings are confirmed by analyses of 1 million actual hospitality interactions among users of Airbnb.

A new wave of companies, emerging under the banner of the sharing economy (1), is profoundly altering the way we interact and exchange with one another. These Internet-based services are driving a major change in our cultural and technological landscapes and have achieved astounding success, enabling users to share their own personal resources, such as their vehicles, real estate properties, time, or skills. A growing number of individuals trust the sharing economy with a variety of services to satisfy their needs, to generate income, or, more simply, to meet new people. Examples of sharing-economy transactions include hiring a “tasker” from Task Rabbit to run errands, sharing a “couch” with a perfect stranger through CouchSurfing, hiring a “driver” on Uber, or staying in someone’s home while traveling using Airbnb.

Users in the sharing economy seek to connect with others engaged in activities on the same platform. Compared with exchanges via traditional e-commerce companies, where transactions are relatively anonymous, the sharing economy exposes us to the more personal character of such interactions. This inevitably prompts attention to the users’ sociodemographic characteristics as factors that drive selection.

As a consequence, social biases figure as major hurdles to the growth of sharing-economy services, as they influence users’ perceptions of trust and risk. To enable trust between strangers so that everyone can exchange with anyone, beyond cultural and social boundaries, these companies face daunting obstacles in their attempts to minimize these biases.

In this study, we investigate whether and to what extent a sharing-economy platform can design technological features to counteract natural behavioral tendencies that may lead to social biases. This question is of central importance in the social sciences more broadly, but also in the engineering of platforms that aim to enable trust.

Social biases are a result of a number of mechanisms that are difficult to measure. In this work, we make social biases amenable to investigation by focusing on a form of social bias that naturally maps into a quantifiable interpretation and that we expect to be at work in these environments. At the same time, this source of bias is well understood in the social sciences so that we can rely on previous literature, instead of opening up a new dimension of complexity. To this end, we focus on homophily (2–6), the higher likelihood that people trust others who are similar to themselves.

McPherson (4) proposed a theory of how homophily structures modern societies using a construct of social space defined in Blau’s theory of preferences (6). Each individual occupies a position in the social space whose coordinates are a function of his or her sociodemographic characteristics. The more features two individuals share in common, the more likely they are to form relationships based on mutual trust.

To operationalize homophily in a structured way, we use Blau’s construct of social space to induce and measure the effect of homophily in an experimental setting whose volunteers are active members of the sharing economy. (At the time of writing, the online experiment is accepting participants for demonstration purposes at stanfordexchange.org.)

Building on this baseline, the heart of our experiment is the measurement of the extent to which another source of information that can be artificially engineered could potentially alter the perception of trust structured by homophily and counteract this natural tendency. To this end, we focus on the reputation system (7, 8), which platforms use to allow users to review and to “rate” the behavior of other members (7).

The premise of reputation systems is that the aggregate rating associated with a person is an indicator of the quality and the risk entailed in potential transactions with that individual. We hypothesize that it is because of these reputation systems that the lack of direct experience in interacting with unknown and distant alters does not generate paralyzing uncertainty.

Nonetheless, previous research presents only weak evidence that reputation systems serve as safeguards of opportunistic behavior, which would result in increased trust (9). Moreover, there has been limited quantification of the extent to which reputation systems have the capacity to increase trust between those with different degrees of dissimilarity in social space. This study directly measures interpersonal trust structured by the interplay between homophily and reputation.

Researchers have extensively studied reputation systems in online platforms in auction markets (10), crowdsourcing (10, 11), and the sharing economy (12–14). The latter case is of particular interest, as the user population, as well as the pool of services they offer, is too large and diverse to be standardized, while users cannot rely directly on preexisting institutional arrangements to inform their decisions.

Measuring trust as a function of the influence of social distance or reputation directly on a sharing-economy platform represents a major research challenge that is not amenable to direct manipulation by researchers. The platform exposes users to features of the alternatives that are confounded with trust. Features, such as color preference, attractiveness, etc., are difficult to isolate, categorize, or quantify. These factors are highly heterogeneous, and users infer them subjectively and indirectly through photos or other signals, as opposed to through structured data that the platform displays. Moreover, when the platform presents users with alternatives, we observe the outcome of the user’s thought process by their selections, but it is difficult to capture and quantify their preferences between every pair among the available alternatives without making the selection process unnecessarily complex.

Due to such challenges, we designed a large-scale online laboratory in collaboration with Airbnb, one of the world’s most successful sharing-economy companies, with >2 million hospitality listings in >190 countries. We engaged 8,906 Airbnb users as volunteers to participate in an experiment external to Airbnb’s platform, with the aim of collecting behavioral data on decisions that involve trust, while isolating other confounding factors (15).

Traditionally, buyers and sellers may develop trust and engage in multiple exchanges over time. However, relationships in the sharing economy usually result in one-time transactions. To mimic this scenario while allowing for comparability with previous literature, we drew from research on behavioral economics and social psychology to use standard methods to measure interpersonal trust. To this end, we designed a variation of the widely studied investment game (16, 17).

In the experiment, participants played the role of Investor and started with a number of credits. We showed them five profiles, which we presented as belonging to other randomly selected Airbnb users who had entered their information in a previous round. These other users were playing the role of receivers with incentives to accumulate credits. We displayed the receivers’ demographics and reputation features, resembling the way the platform presents users with potential partners via search results. We then gave incentives for participants to seek favorable investment outcomes in a single-shot interaction with the receivers by offering potential rewards.

The credits participants invested in a profile were multiplied by three and given to that receiver. In the spirit of the prisoner’s dilemma game (18), the receiver could cooperate with the participant and return a good portion of the credits invested in them (to help the participant increase her credits) or defect and keep most or all of these credits (in which case the participant loses credits). As participants put themselves in a vulnerable position through investments, the amount of investment in each receiver served as an indirect proxy for how much trust they placed in the receivers to return credits.

While we presented the profiles to participants as other Airbnb users also participating in the experiment, we actually generated these profiles in advance. Our goal was to understand how participants assign trust and the relative importance of each facet, homophily or reputation, as we varied demographic and reputation features experimentally.

We present evidence, first, that homophily is at play to a significant extent when participants make trust decisions. Second, we show that the reputation system makes possible the construction of expectations that counterbalance the tendency toward homophily. This results in the extension of trust to dissimilar users in social space. Third, we show a strong inverse relationship between risk aversion and trust in the case of high reputations. Lastly, we used insights gained through the experiment to guide our analysis of Airbnb data containing 1 million real-world interactions. We show evidence that the diversity of users who select others with whom to have a hospitality interaction increases as the reputation of the partners gets higher.

Experimental Design

We sent invitations to 100,000 Airbnb users who identified as US residents, of which 8,906 responded and registered to participate (6,714 completed their entire participation). Table S1 presents the univariate distributions of the demographics over the participants, and Tables S2–S4 analyze self-selection bias. Each participant built a short profile by providing us with four pieces of information reflecting their demographics—namely, age, gender, marital status, and home state. We chose these features for their simplicity, allowing us to conveniently operationalize Blau’s construct experimentally while reducing participation attrition.

Table S1.

Univariate distributions of the features describing the experiment’s participants

Characteristic	n	%
Male	$3, 081$	$0.34$
Married	$3, 614$	$0.40$
Host	$5, 333$	$0.60$
Region
Midwest	$1, 054$	$0.13$
Northeast	$2, 282$	$0.28$
Pacific	$2, 684$	$0.33$
South	$2, 286$	$0.28$
West	$600$	$0.07$
Previous hospitality experience
No experience	$2, 021$	$0.25$
One experience	$3, 380$	$0.42$
Two or more experiences	$3, 446$	$0.43$
Age	mean = $39.70$	SD $14.22$
Risk answer	Mean = $1, 909.37$	Median = $500$

Open in a new tab

Table S2.

Logistic regression of binary response that codes whether or not the invitee decided to participate in the experiment

Characteristic	Coefficient	$P r (> \| t \|)$
(Intercept)	$5.546 e - 02$	$< 2 e - 16$
Male	$- 2.310 e - 02$	$< 2 e - 16$
Age	$6.682 e - 05$	$0.22845$
No. of reviews as guest	$1.419 e - 02$	$2.21 e - 10$
Average rating as guest	$2.065 e - 03$	$0.00592$
No. of reviews as host	$2.429 e - 03$	$< 2 e - 16$
Average rating as host	$1.243 e - 02$	$< 2 e - 16$

Open in a new tab

Table S4.

Variance inflation factors of the predictors used in the logistic regression (Table S2)

Characteristic	Variance inflation factor
Male	1.002
Age	1.036
No. of guest reviews	3.157
Avg. guest rating	3.156
No. of host reviews	1.359
Avg. host rating	1.379

Open in a new tab

The low collinearity of the predictors allows us to draw conclusions on the characteristics of the participants against those of the invitees. Avg., average.

We required that the experiment reflect, as much as possible, the way users make decisions on the platform, except for factors that are external to trust. In the platform, users are pressed to make the best possible decisions, as it is imperative to eliminate risks associated with critical factors, such as their safety, while maximizing satisfaction and minimizing cost. Thus, a major challenge in the design of our online experiment was to engage users so that they would attempt to make the best use of their judgment when making decisions involving trust. To capture attention and provide incentives for the exertion of good judgment, we offered 100 prizes, each for 100 US dollars (USD). The chances of winning were proportional to the number of credits accumulated in the investment game.

We generated each of the potential receivers according to prescribed rules. We placed the receivers’ profiles at social distance $d$ from the participant, defined in the context of Blau’s social space as the number of features on which two individuals differ (6). This is equivalent to the mathematical definition of Hamming distance. Accordingly, distance $d = 0$ meant that the receiver matched all of the demographic attributes of the participant (e.g., the same age group, the same gender, the same marital status, and the same US state). In turn, $d = 1$ meant that one randomly selected feature’s category differed from that of the participant. The profile at $d = 2$ is strictly farther from the participant by having one additional randomly selected feature changed to a different category. Lastly, $d = 4$ meant that the profile had all of the demographic features in a different category from those of the participant. The receivers were placed at distances $d = 0, 1, 2$ , and two of them were placed at $d = 4$ . We showed the five profiles simultaneously on the participant’s screen in random order.

In addition to demographics, the generated profiles included two reputation features—namely, the average number of star ratings and the number of reviews on Airbnb. The star rating is a postinteraction subjective evaluation of an alter. It consists of the assignment of zero to five stars, where the number of stars is proportional to the degree of positiveness. The ratings a member receives are averaged over all of their raters, rounded to the half unit, and presented in the member’s profile on the platform. Similarly, an interaction grants the two parties the opportunity to mutually provide free-form written reviews. Due to the difficulty of manipulating textual contents of reviews experimentally, we restricted our attention to the number of reviews a user received.

We manipulated these two dimensions in a structured way to study their effects on trust. Among the five profiles participants saw on the screen, four had reputation features with similar values, chosen independently at random for each participant’s session, which we refer to as the baseline reputation. These were the profiles at social distances $d = 0, 1, 2$ and one of the profiles at $d = 4$ . The other generated profile at distance $d = 4$ had one of the reputation features randomly selected to be switched to either a better or a worse value than baseline (see Game Design Details for how we manipulated the numerical values of reputation). For convenience, we refer to the profile that has a different reputation feature than the baseline as being at distance $d = 5$ .

We randomly assigned users to two possible worlds. In world 1, the profile at $d = 5$ not only had the largest distance from the participant, but also a weaker reputation than all other profiles (the baseline reputation). In this case, reputation did not compete with the tendency toward homophily. In world 2, the profile at $d = 5$ had a better reputation than the baseline reputation. This induced a tension between placing trust in the most distant profile with a better reputation or in the other profiles closer to the participant in social space. Fig. S1 shows a partial view of the screen users see in the experiment, and Fig. S2 shows a diagram that exemplifies the structure of a user’s session.

Fig. S1. — Partial view of the screen the participant sees during the experiment. It shows the participant’s profile against one of the synthetic profiles.

Fig. S2. — Example of the structure of a user session. The symbol S in the figure indicates the same values as the participant’s features; D indicates a different value, which increases distance in the social space; and B indicates baseline reputation. The random decision of which feature to vary to increase distance from $d = 0$ to $d = 1$ is labeled $R 1$ , and to $d = 2$ , $R 2$ , and the reputation feature we vary, $R 3$ . Other random choices include the profiles’ age and region (whenever they have to be different from those of the player) to values outside of the player’s own age group and region.

We gave participants a single “wallet” with 100 credits, which they could keep or invest in receivers in whatever way they chose. Therefore, participants could gain or lose credits through their investments. Because this was a one-time game, it was easy to show that the Nash equilibrium was not to invest any amount, since the dominant strategy for receivers was not to return any amount. (Nevertheless, we observed such rational behavior only in rare instances.)

It is argued that risk is a component of trust in general, and some definitions of trust include risk (8). Even though previous research has attempted to relate trust and risk, the empirical evidence of the connection between risk attitudes and trust has been weak (17). Moreover, research that has addressed this question has been limited to laboratory experiments or small datasets.

Given the opportunity to study this question using a large population, we introduced a risk-assessment question before the investment game. We worded the question as: “A lottery ticket costs 100 (USD) and people win with 50% chance. How much should the prize be for you to choose to buy a ticket?” Players could enter any numerical value, which corresponded to the minimum reward that would make the participant take the risk of buying a ticket. The prize value 200 (USD) had the expected value of net gain equal to zero (after paying off the ticket) and corresponded to the minimum rational value. Thus, values >200 (USD) measured risk aversion proportional to their magnitude. In Risk Assessment Question, we summarize the distribution of answers (Table S1) and argue that our measure captures risk behaviors in accordance with previous research (Table S5) (19, 20).

Table S5.

The outcome of the cross-validation of an $L_{1}$ -regularized regression of the log-transformed answer to the risk assessment question

	Dependent variable:
Covariate	risk (log)
Age	0.016*** (0.001)
Gender = male	$-$ 0.712*** (0.039)
z.savings	0.138** (0.065)
gini	$-$ 0.553*** (0.116)
Experience = 1	$-$ 0.031 (0.049)
Experience $>$ 2	$-$ 0.139*** (0.049)
gini:savings	0.011*** (0.003)
Constant	7.232*** (0.070)
Observations	6,686
R²	0.094
Adjusted R²	0.093
Residual std. error	1.492 (df = 6678)
F statistic	98.571*** (df = 7; 6678)

Open in a new tab

The explanatory variables are the participant's features. *P < 0.1; **P < 0.05; ***P < 0.01. Std., standard.

Multilevel–Multivariate Analysis

We had five measurements (investments) on each observational unit (participant). As a result, the five investments were correlated, which we accounted for by nesting investments within subjects in a multilevel model. We fitted the model using a multivariate regression with 10 independent variables, one for each investment in the combination $(d, w)$ of profile distance $d : {0, 1, 2, 4, 5}$ and world $w : {1, 2}$ . The investments a participant made had different sources of mutual correlation. For instance, the sum of the investments had to be at most 100 credits. We accounted for these by computing the model fit with an unconstrained covariance structure that learned from the data the correlations and independent variances across measurements (21).

As a first-order approximation, we fitted the empty model (i.e., without explanatory variables) with 10 intercepts. The five intercepts for each world corresponded to the average distribution of investments among the five profiles across all participants (complete pooling). Fig. 1 shows a plot of the mean estimates, together with the mean number of credits saved, for worlds 1 and 2. Table S6, model 1 shows the numerical estimates from the model fit.

Fig. 1. — Empty model estimates of average investment in profile at distance $d$ and average savings. (A) In world 1, the second profile at distance $d = 4$ (here identified as $d = 5$ ) has a worse reputation than baseline. (B) In world 2, the profile at distance $d = 5$ has a better reputation than the baseline.

Table S6.

The estimates of (i) the empty model, (ii) the model with all explanatory variables, except the interaction terms, and (iii) the full model

	Dependent variable
	Investment
Covariate	Empty	No interactions	Full
	(1)	(2)	(3)
d0:world1	21.556*** (0.239)	23.956*** (0.880)	21.833*** (1.201)
d1:world1	18.414*** (0.189)	19.488*** (0.694)	17.287*** (1.086)
d2:world1	17.150*** (0.184)	18.613*** (0.681)	16.393*** (1.128)
d4:world1	16.024*** (0.187)	15.060*** (0.712)	12.835*** (1.289)
d5:world1	8.942*** (0.273)	8.851*** (1.027)	7.110*** (1.441)
d0:world2	17.025*** (0.239)	18.383*** (0.931)	17.395*** (0.984)
d1:world2	14.045*** (0.189)	16.973*** (0.735)	16.399*** (0.817)
d2:world2	12.816*** (0.184)	14.086*** (0.724)	14.042*** (0.864)
d4:world2	11.861*** (0.187)	10.978*** (0.750)	12.309*** (1.074)
d5:world2	25.801*** (0.273)	26.501*** (1.091)	27.769*** (1.334)
d0:world1:gender_S_male		$-$ 1.648*** (0.511)	0.934 (0.711)
d1:world1:gender_S_male		$-$ 0.708* (0.403)	1.048* (0.564)
d2:world1:gender_S_male		1.320*** (0.393)	2.204*** (0.507)
d4:world1:gender_S_male		3.467*** (0.397)	2.884*** (0.547)
d5:world1:gender_S_male		1.920*** (0.574)	1.253* (0.687)
d0:world2:gender_S_male		$-$ 1.445*** (0.519)	1.928*** (0.719)
d1:world2:gender_S_male		$-$ 1.271*** (0.409)	0.961* (0.564)
d2:world2:gender_S_male		0.049 (0.399)	1.243** (0.519)
d4:world2:gender_S_male		1.479*** (0.403)	0.344 (0.554)
d5:world2:gender_S_male		3.575*** (0.584)	2.466*** (0.695)
d0:world1:marital_S_single		$-$ 2.404*** (0.502)	0.099 (0.640)
d1:world1:marital_S_single		$-$ 1.650*** (0.397)	0.160 (0.511)
d4:world1:marital_S_single		1.087*** (0.393)	0.840 (0.604)
d5:world1:marital_S_single		1.094* (0.569)	0.866 (0.732)
d0:world2:marital_S_single		$-$ 1.401*** (0.508)	0.489 (0.641)
d1:world2:marital_S_single		$-$ 1.244*** (0.399)	$-$ 0.172 (0.510)
d2:world2:marital_S_single		$-$ 0.457 (0.390)	$-$ 0.187 (0.493)
d4:world2:marital_S_single		0.777** (0.396)	$-$ 0.706 (0.614)
d5:world2:marital_S_single		1.486*** (0.574)	0.002 (0.743)
Observations	33,570	33,570	33,570
Log likelihood	$-$ 131,187.900	$-$ 130,662.200	$-$ 130,544.400
Akaike inf. crit.	262,425.900	261,634.400	261,438.900
Bayesian inf. crit.	262,636.400	262,939.700	262,912.600

Open in a new tab

For illustration purposes, we show how the coefficients of gender and marital status change between the model without interactions and the full model. We omitted the other coefficients due to the large number of covariates. Inf. crit., information criterion. *P < 0.1; **P < 0.05; ***P < 0.01.

We were mainly interested in the additive effect of the number of different coordinates between two individuals’ feature vectors, or their Hamming distance. However, any real-world sociodemographic feature inevitably produces heterogeneous effects on trust (e.g., gender may affect investments more than marital status does), and Hamming distance by itself may not explain all of the variance in the investments. Thus, to take these effects into account, we extended the empty model by including explanatory variables.

Table S7 shows a list of the inputs we used to form these covariates. They can be categorized into three sets structured in a multilevel model as: (i) level 1 variables corresponding to the profile’s characteristics, annotated with “P”; (ii) level 2 variables corresponding to the participant’s (or subject’s) characteristics, annotated with “S”; and (iii) the cross-level interactions between the level 1 and 2 variables.

Table S7.

List of inputs from game data

Feature	Description
Level 2: Inputs associated with the participant
(S) Age	Integer in the range $[18, 80]$
(S) Gender	Male or female
(S) Marital status	Married or not married
(S) Region	US home region of the participant
(S) Role	Guest or host
(S) Average rating	Participant’s average rating on Airbnb
(S) No. of reviews	Participant’s number of reviews on Airbnb
(S) World	World, 1 or 2, to which the participant
	was assigned to in the game
(S) Risk	Numerical value entered to answer
	the lottery question
(S) Experience	No. of hospitality interactions
	of participants on Airbnb: ${0, 1, 2 +}$
Level 1: Inputs associated with the profile
(P) Rating	Star rating of the profiles “on Airbnb”
(P) No. of reviews	No. of reviews of the profiles “on Airbnb”
(P) Age	Integer in the range $[18, 80]$
(P) Gender	Male or female
(P) Marital status	Married or not married
(P) Region	US home region of the profiles
(P) Profile order	Order the profile displays in the participant’s screen
(P) Profile distance	Profile’s social distance category: ${0, 1, 2, 4, 5}$

Open in a new tab

We expected that the demographic features we used to increase social distance could have resulted in effects rooted in preferences, which are not necessarily biases, such as preferences for “female,” “married,” or “older” as indicators of perceived trustworthiness. Thus, the cross-level interactions aimed to control for these effects.

The multivariate model estimated the effects associated with the covariates specifically for each dependent variable $(d, w)$ . This allowed us to show the effects of each explanatory variable on trust in each of the five profiles (in each of the worlds) separately. In the case of cross-level interactions, this was not always possible due to the symmetries in the participant’s session. For example, all profiles at $d = 0$ exactly matched the participant’s gender, marital status, and region. In these cases, we estimated joint effects on the investments in the five profiles simultaneously (by world). In Table S8 we present an alternative analysis of the data based on McFadden’s choice model (22).

Table S8.

Conditional logit analysis by modeling the data as a discrete choice (McFadden’s choice model)

	Dependent variable
Covariate	Received maximum investment
rating_P_4	0.086 (0.119)
rating_P_5	1.423^*** (0.166)
log.reviews_P	2.171^*** (0.131)
age_P	0.284^*** (0.042)
gender_P_male	$-$ 0.355^*** (0.038)
region_P_MidWest	0.029 (0.061)
region_P_NorthEast	0.294^*** (0.059)
region_P_Pacific	0.257^*** (0.065)
region_P_West	0.079 (0.079)
marital_S_married:marital_P_married	$-$ 0.033 (0.091)
marital_S_single:marital_P_married	0.435^*** (0.085)
d0:world1	1.156^*** (0.164)
d1:world1	0.402^** (0.160)
d2:world1	0.159 (0.154)
d4:world1	0.099 (0.148)
d0:world2	0.339^** (0.138)
d1:world2	$-$ 0.536^*** (0.138)
d2:world2	$-$ 0.820^*** (0.136)
d4:world2	$-$ 1.352^*** (0.147)
Observations	20,190
R²	0.158
Max. possible R²	0.475
Log likelihood	$-$ 4,767.618
Wald test	2,464.180^*** (df = 19)
Log-rank test	3,462.584^*** (df = 19)
Score (log-rank) test	3,880.873^*** (df = 19)

Open in a new tab

*P < 0.1; **P < 0.05; ***P < 0.01. Max., maximum.

Note that the intercepts of the full model are consistent with those in the empty model, up to estimation errors (Table S6, models 1, 2, and 3). Thus, we used this estimate of the distribution of mean investment over the profiles as a starting point and studied how the explanatory variables changed these values.

Fig. 2 presents the effects of these covariates (integer-valued variables were centered and standardized to make all coefficients comparable). Negative values reduce average investments, whereas positive values increase them. Our main goal was to show that the heterogeneity of the features did not significantly alter the main effects we observed on average investments as a function of $d$ in the empty multivariate model.

Results

Fig. 1A shows that homophily dominated investment decisions. That is, the farther away the profile was on the demographic dimensions from the participants, the lower the investment they received, on average. Furthermore, the profile at $d = 5$ with worse reputation received less investment on average than the equivalent alternative with respect to social distance (i.e., the profile at $d = 4$ ). Quite strikingly, Fig. 1B shows that reputation builds trust beyond homophily. The average investment in the profile at $d = 5$ , possessing the best reputation, was significantly higher than the average invested in all of the closest profiles. Note that despite the strong influence of the reputation system in world 2, the magnitude of the investments in the profiles with baseline reputation was still driven by homophily.

The explanatory variables exhibited variance beyond that explained by social distance, which implies that there are differences in investment behavior by demographic group and their interactions. However, as we argue next, the changes in the average investments (model intercepts) that these effects produced in the multivariate model were not strong enough to significantly alter the conclusions regarding homophily and reputation that we previously derived from the empty model.

Homophily Is at Work.

The covariate “profile distance” was by far the dominant one with respect to variance explained (F value $5668.8$ , $P < 0.001$ ). This was followed by the number of reviews with a much smaller F value ( $26.1$ , $P < 0.001$ ).

The dashed lines in Fig. 2 have the values $\pm 1.37$ and correspond, in the most conservative way, to the smallest difference in average investment between two profiles with baseline reputation, minus two standard errors. That is, a coefficient that exceeds these boundaries potentially produces an effect that could alter the conclusions we derived from the empty model. A first glance at Fig. 2 reveals that most of the coefficients are contained within these boundaries.

Fig. 2 shows that participant’s gender “(S) male” in both worlds had small positive effects on all profiles. Marital status “(S) single” had effects that were not significantly different from zero. For age, the older the profiles [“(P) age”], the more credits they received. One SD (14 y) above the mean (39.7) had positive effects for all of the profiles with coefficients ranging from 0.93 (0.44) to 2.29 (0.81). The effects associated with region had small values that varied together across different profiles (omitted in Fig. 2 for clarity).

As these effects changed the investments roughly uniformly across the profiles, these effects did not cause significant changes in the differences between the investment means.

We note that the preceding effects did not change homophily trends due to the inclusion of interaction effects between participants’ characteristics and those of the profile in the model. In Fig. 2, these variables are labeled with both S and P, such as “(S) female, (P) male” for gender. Recall that we included these interactions to capture preferences that are not necessarily biases. Indeed, in both worlds, male profiles received on average up to 3.38 (0.50) fewer credits than females, while not married profiles received on average up to 2.50 (0.40) fewer credits than married profiles. Age difference exhibited a nonlinear relationship. As the profiles got older than the participant, homophily came into play, and the positive effect of the profile’s age decreased significantly, as indicated by the interaction of profile’s age with the age difference between the profile and the subject.

Without controlling for these preferences (no interaction effects), the model exhibited effects associated with demographic features that canceled out the homophily effects produced by social distance in the case of males or singles. For illustration, in Table S6, we included the effects of gender and marital status for the models that included the interactions (model 3) and that with interactions removed (model 2).

As the group effects of investment behavior were not large enough to alter the trends produced by profile distance, we show evidence that homophily figures as a major driving force, structuring decisions of whom to trust with investments.

Trust via Reputation.

We first focus on the effects of reviews in Fig. 2. In world 1, an increase in the log-transformed number of reviews, “(P) reviews (log)” resulted in a statistically indistinguishable increase in mean investment in profiles with baseline reputation, between 2.03 (0.47) and 3.20 (0.36) credits. Although the profile at $d = 5$ in this world always had fewer reviews than baseline, the variation in its number of reviews did not affect the average investment it received. In contrast, in world 2, an increase in the number of reviews increased the mean investment in the profile at $d = 5$ , with the best reputation by 5.42 (0.52) credits. Symmetric to world 1, the change in the number of reviews of the baseline reputation did not affect the average investment in these profiles.

Comparing the effects of number of reviews between the two worlds, we see that high reputation resulted in larger investment increases in cases in which holding the best reputation was an exception among the alternatives (world 2). Surprisingly, these exceptions were the profiles that were the farthest away from the participants in the social space.

The coefficient of the joint effect estimated for variable “(P) rating = 4” represented an increase of 1.74 (0.83) and 1.13 (0.37) credits for worlds 1 and 2, respectively, in mean investment (reference “no rating available”). The corresponding increases for profiles with five-star ratings, “(P) rating = 5,” was 2.21 (0.82) and 0.99 (0.36) for worlds 1 and 2, respectively. This shows that varying between 4 and 5 stars did not cause a significant difference in average investment, as participants may have considered them equally high.

In the full model of world 1, the difference between the mean investments comparing the profile with baseline reputation receiving the smallest average investment ( $d = 4$ ) and the profile at $d = 5$ with the lowest reputation was 5.73 (1.89). In world 2, the difference between the profile with baseline reputation receiving the largest mean investment ( $d = 0$ ) and the profile at $d = 5$ with the best reputation was 15.46 (1.68). In Fig. 2, we see that none of the effects were large enough to cancel out the shifts produced by reputation and alter our conclusion with respect to trust increases (world 2) or reductions (world 1). This shows evidence that the reputation system is a strong signal that shifts trust beyond homophily, thereby overriding the effects of assessments of social distance.

Risk.

Fig. 2 shows the effects of answers to the risk-assessment question on the investments in each profile. We grouped responses by ranges, where the higher the range, the more risk-averse we classified a participant to be. These are the covariates with prefix “(S) risk in range,” where the reference level is the range $[200, 400]$ , the low end of rational values.

In world 1, we saw little or no effect associated with risk attitudes on the investments in any of the profiles, except for small negative effects on the investments in those with baseline reputation and weak similarity with the participant ( $d = 2$ and $d = 4$ ). The effects ranged from a reduction of −1.74 (0.61) to −2.85 (0.48) in average investments, with slightly stronger effects proportional to the level of risk aversion.

The most striking results were related to world 2. In this case, risk attitudes did not correlate with the average investments in any of the profiles, except in the profile at $d = 5$ (with significance $P < 0.001$ ). These effects were among the strongest we found (Fig. 2, Right, the bottom three items). The decrease in mean investment ranged between 3.91 (0.75) and 8.16 (0.71) and was inversely proportional to the degree of risk aversion. This shows that risk aversion was not correlated with reduced trust in general; restricted to the case of high reputation, trust had a strongly negative correlation with risk aversion. The more risk-averse the participant, the less they trusted the positive information provided by the reputation system. Interestingly, risk aversion did not seem to correlate with distrust in negative reputations.

Real-World Data Analysis.

The intuitions we gained from the experiment suggested that reliance on reputation may reduce the user’s attention to the number of dimensions in which their partner’s demographic characteristics differ when they select a host or a guest. In the experimental data, we had access to every option the participant considered and the degree of preference for every pair of them. Although this is not possible to observe from Airbnb’s internal database of historical interactions, we sought to use the insights gained through the experiment to guide a real-world, large-scale data analysis and extract the same intuition.

On Airbnb, guests are the active participants in the social selection process. They select partners through searching and making a request. We studied 1 million requests to stay by guests taking place over the same period as the study. For purposes of this analysis, we considered two dimensions of social distance: age and gender. (Airbnb does not collect marital status, and the hospitality interactions usually occur between users from distinct locations.)

For each of these demographic features, we coded distance as 0 if values for hosts and guests were available and equal, and 1 otherwise. We considered two ages equal if they were within 10 y of each other. (We repeated the analysis using different age thresholds within which we considered two people as belonging to the same age group, namely 3, 5, 10, and 20 y. Across the different experiments, the absolute values of social distance changed, but the trends did not.)

Fig. 3 shows how the average social distance between guests and hosts varied, conditioned on the number of reviews and on the star rating of the host at the time of booking request. Strikingly, the intuition we derived in the experiment held in the actual platform. We saw a trend for the average social distance to increase, which became clearer as the number of reviews (within each graph in Fig. 3) and ratings (across graphs in Fig. 3) changed. Note that this effect is not simply explained by an increased number of interactions—we did not see a significant increase in social distance for hosts with a large number of reviews and low ratings (Fig. 3, first graph). This shows that our experimental findings were not simply an artifact of our online laboratory, but that our main conclusions generalized to patterns found in real interactions. That is, as high reputation tends to shrink social distance, we saw higher tolerance for individuals at farther social distances between guests and their selected hosts as the reputation of the host got better.

Discussion

Companies operating in the sharing economy are predicated on trust, but cannot rely directly on preexisting institutional arrangements. Our work shows evidence that the reputation system of Airbnb, and by extension of sharing-economy sites—the star ratings and the number of reviews—may operate to bridge the gap between institutionally generated trust and the organically grown trust present in social platforms. Although we gathered evidence for the tendency of individuals to trust similar others, by trusting the reputation system, participants in our study were willing to extend trust to those who exhibited a high degree of dissimilarity in the social space.

While we present evidence that these effects are at work in the actual Airbnb platform, our experimental results are limited to the specific population that participated in our study. Moreover, although we found very sizable effects associated with homophily, we emphasize that the inclusion of other demographic characteristics not displayed explicitly by the platform, but that can be inferred indirectly through pictures or other signals—such as nationality, race, class, religion, ethnicity, etc.—could lead to the observation of even greater effects. For example, the literature suggests that racial features play a significant role in determining trust (23, 24).

Materials and Methods

Our experimental methods were reviewed and approved by Stanford University’s Internal Review Board (protocol 34470, approved on August 11, 2015). We required invitees to provide us with consent to participate in the study, whose terms we displayed on the entry page of our experiment’s website.

See Supporting Information for detailed information on our sample, an analysis of self-selection, and more information on our research design and data analysis.

Game Design Details

Fig. S1 shows part of the screen the participants saw in the experiment, while Fig. S2 shows an example of the structure of a game session.

In Fig. S2, the symbol S indicates the same values as the participant’s features; D indicates a different value, which increases distance in the social space; and B indicates baseline reputation. Note that the participant’s demographic features almost completely determine the structure of the session the participants see, up to a few random factors: the decision of which features to vary to increase distance from $d = 0$ to $d = 1$ (labeled $R 1$ in the diagram) and to $d = 2$ ( $R 2$ ), and the reputation feature we vary ( $R 3$ ).

In the construction of the features of the profiles in the receiver population, dichotomous demographic features such as gender and marital status are simple to generate or flip. [The dichotomous classification for gender used in the experiment reflects the need to simplify the number of conditions included in our research design. We understand that a more complex classification system is preferred that allows for a broader range of culturally constructed categories (25).] The generation of the other synthetic feature values involves some random choices. We collected from Airbnb’s database the distribution of age, number of reviews, and star ratings of all active users. The participant’s age group was defined as an interval of 28 y that included the player’s own age as the midpoint, while the range of possible values was the interval [18, 80] years old (all participants were 18 y old or over). Thus, for the synthetic profiles, we drew values either within or outside the participant’s age group, depending on the intended distance. The age range may be too conservative, and the participants may have considered profiles within the range as being farther away from them compared with people they psychologically identify as belonging to the same age group. However, we accounted for this by including age as an explanatory variable in our multilevel–multivariate model. Lastly, when we selected another state, we picked a random state from a different region (as defined by the US post office) from that of the participant.

The inclusion of the covariate that indicates the order in which the profile was placed on the participant’s screen in the multilevel model produced a nonsignificant effect.

As for choices of the reputation feature values, the two features, star rating and number of reviews, were first randomly selected to be either in the low or high categories. For example, if the baseline reputation had a high number of reviews and a low average star rating, then we changed one feature, say, number of reviews, to low (in the case of world 1) to produce the reputation of the other profile at $d = 4$ .

As the distribution of average star ratings is skewed toward high values (26), we used 4 as the low and 5 as the high category. For the number of reviews, we were interested in observing the effect of different deltas in the number of reviews (e.g., the impact of having one review), while the baseline reputation had zero, or of having two or three reviews, while the alternative had one, and so on. Therefore, the low category consisted of 0, 1, or (2 or 3) reviews. The high category included 1, (2 or 3), or 11 or more reviews. When we randomly select one of the values from either low or high categories, we imposed the constraint that, when we changed the category of this feature, the number of reviews in the high category was always greater than that in the low category. We drew from the real distribution of the number of reviews from the user’s population when we assigned a number of reviews that was prescribed to be >11.

The rating of the participant and their number of reviews on Airbnb were available to us as inputs. However, the data were very sparse. Even though all participants were registered users of Airbnb, some of them had no experience using the platform at the time of the experiment, and most of those who did have a hospitality experience did not receive ratings or reviews from their partners. Furthermore, we noticed that the inclusion of these predictors in the multilevel model produced negligible, nonsignificant effects.

In traditional forms of the investment game (17), players interact in multiple rounds, each with a single receiver, and are given a new wallet (or stash of credits) each round. These credits can be used to invest only in the specific receiver in that round. Ignoring the fact that players remember past rounds, in the traditional form of the investment game, investments in different profiles by the same participant are closer to being “independent”. The reason to modify the original version and remove the independence of the investments was twofold. First, research shows that, in experiments that aim to measure preference, presenting complete information on all available options and their attributes, while restricting the decision space within a session, results in more reliable estimates of within-subject effects while mitigating problems of estimating the subjects’ external knowledge of outside options, which can be difficult to measure (9). Moreover, it helps to capture the participant’s attention span to obtain high-quality behavioral data. Having a single-shot game with all of the profiles displayed simultaneously made the game faster to play. In this scenario, the allocations from a limited common pool of resources (single wallet) among different alternatives allowed for a cleaner interpretation of between-subject preferences among the alternatives. Second, in phone interviews after the pilot, users reported that binding the credits to specific receivers made the possible strategies limited and the investment feel less engaging, or game-like. Moreover, the multiple wallets created detachment from the credits due to a sense of too many credits available, which caused players to exercise judgment to a lesser extent when they invested.

To assure that each participant played the game at most once, we sent a unique one-time token as an invitation. We established a limit of 30 min for the player to complete the game from the time they registered to play. To make it more difficult to simply skip a profile, we enforced the fact that the amount the players invested in each receiver had to be explicitly declared (even if it was zero).

How We Computed the Game Winners

To make the game closer to what players would observe in real situations, we drew from our experience running a pilot experiment before the present study with 400 users from other sharing economy platforms (27), where real users played the roles of receivers in addition to that of investor. Thus, for each of the synthetic profiles, the percentage of the amount invested that the receivers automatically return was drawn independently from the distribution of returns we observed in the pilot, regardless of the receiver’s characteristics. At the end of the game, we told the participant that we would reveal their investment outcome, once we heard from all of the receivers they interacted with.

Six weeks after the participants played the game, we contacted them by email, saying that we received the net outcome on their investment from all of the recipients who they interacted with. Access to this information at that particular point in time did not affect the results, because, by the time the participant received their game results, they had already concluded their participation.

We were careful to avoid creating an association between a profile and a particular investment return behavior. Therefore, we reported only the participant’s new balance after the returns, as opposed to the amount that each of the profiles returned.

After the experiment, we contacted 105 participants to give them a prize of 100 (USD) each.

Details on the Population Sample

The sample of 100,000 invitees was drawn from Airbnb’s user population. We aimed to produce diversity in the sample regarding their role on the platform (i.e., guest or host and their level of experience, reflected by the number of hospitality interactions at the time of the invitation).

We required that users enter their demographics, as the user agreement of Airbnb prohibits the company from sharing this information with partners. We checked the information the players entered against Airbnb’s database and found that 95% of users had ages that matched up to within a year’s difference, and 98% and 85% of the users entered gender and states that matched, respectively. The higher mismatch of states could be partially explained by users who moved since the time of data collection or inaccuracies in Airbnb’s database. Airbnb does not collect marital status.

In Table S1, we present the univariate distributions of the demographics over the 8,902 users who participated in the experiment.

Self-Selection Analysis

To account for self-selection, we contrasted the distribution of the characteristics between the invitees and those who participated. We invited 100,000 users who had accounts on Airbnb and identified as US residents to participate. From the population of invitees, 8,902 registered to participate in the experiment. As mentioned above, the invited population was controlled across a number of factors. To check for bias in the subset consisting of the experiment’s participants, we conducted two analyses. First, we evaluated the Kullback–Leibler (KL) divergence of the multivariate distribution of invitees across different buckets of characteristics and multiple subsamples of the invitees (one of them consisting of the actual population of participants). Second, we fitted a logistic regression model whose response variable was whether an invitee participated in the experiment, while the predictors corresponded to the characteristics of the invitees. The former is a standard method for assessing differences in distributions, and the latter is helpful for identifying explanatory factors. In both analyses, we consider the following six factors: gender, age, number of ratings as a guest, average star rating as a guest, number of ratings as a host, average star rating as a host.

For the KL divergence test, we factored the values of each of the six features to give a distribution over $\sim$ 750 bins (where each bin represents a combination of one bucket from each factor). We calculated the distribution over these bins for (i) the entire 100,000 invitees, (ii) the 8,902 participants, and (iii) 1,000 bootstrapped samples of size 8,902 from the invited population. The latter was used to determine the likelihood of observing a population similar to that of the participants by chance. We smoothed the distributions by adding $1$ to each bucket, which prevents division by zero or unintended large ratios in the KL divergence formula due to data sparsity. The KL divergence of ii from i is $0.184$ , and KL divergences of iii from i lie in the interval $[0.034, 0.044]$ . From these observations, we concluded that the population of participants possessed significant selection biases, compared with the population of invitees.

The logistic regression showed all factors except age to be significant in explaining whether an invitee participated. Table S2 gives the regression coefficients and P values. We also included the correlation coefficients between each pair of explanatory factors in Table S3 and the variance inflation factors in Table S4. The low correlation and low collinearity between the predictors (with the exception of the number of reviews and the average rating) allowed us to draw conclusions on the characteristics of the participants against those of the invitees from the logistic regression model.

Table S3.

Correlation matrix of the predictors used in the logistic regression (Table S2)

Characteristic	Male	Age	No. of guest reviews	Avg. guest rating	No. of host reviews	Avg. host rating
Male	1.000	−0.037	0.023	0.015	−0.013	−0.020
Age		1.000	−0.043	−0.051	0.118	0.171
No. of guest reviews			1.000	0.826	0.045	0.025
Avg. guest rating				1.000	0.038	0.013
No. of host reviews					1.000	0.512
Avg. host rating						1.000

Open in a new tab

The low correlation between the predictors allows us to draw conclusions on the characteristics of the participants against those of the invitees. Avg., average.

The results suggest that, of the invited population, women were more likely to respond than men; users with many reviews (as a guest or as a host) were more likely to respond than those with few; and users with a high average rating (as a guest or as a host) were more likely to respond than those with a low average rating. The correlation matrix and variance inflation factors highlighted that users with many reviews tended to have a more favorable average rating, especially in the case of guests.

Risk Assessment Question

Table S5 shows the outcome of the cross-validation of an $L_{1}$ regularized regression of the log-transformed answer to the risk assessment question. The explanatory variables are the participant’s features. The regularization retains only the predictive variables.

From these estimates, we can see that our risk measure (i.e., the lottery question) captured effects in accordance with what was expected from previous literature on risk (19, 20). That is, as participants answered higher values, they invested fewer credits overall and diversified their investments to a greater extent (quantified by the Gini coefficient). Males were more risk-taking than females, and risk aversion increased with age. Furthermore, participants who had two or more hospitality experiences on Airbnb were less risk-averse. This gave us reassurance that our risk measure captured risk behaviors in accordance with what has been found in the previous literature on risk (19, 20).

The risk question also gave us a validity check regarding the participant’s level of attention to the game. Answers that included values <100 (USD) were irrational (i.e., the participant would buy a ticket while the possible net gains could only be less than the ticket price). Answers with values <200 (USD) were “stochastically” irrational in that the expected returns were smaller than the ticket price. We noticed that players in these categories invested less time in the game than the rest of the players in the experiment, which may suggest that they may not have paid close enough attention to the task. Therefore, we removed the 625 players that fell in these categories from our analysis.

Multilevel–Multivariate Model

The profiles a participant saw were not independent samples. In fact, a participant session encoded structure that resulted in mutual dependency among the within-subject investments. We aimed to capture a large volume of data from each brief participation while minimizing the number of experimental conditions. Thus, we had each participant interact with multiple profiles. We intentionally structured different profile roles (as determined by sociodemographic distance) as a functions of the participant’s demographic characteristics. This resulted in several dependencies in the participant’s data.

Not only are the within-subject measurements dependent given the subject, but they are also mutually dependent. The characteristics of the profiles that a participant saw were almost completely determined by the participant characteristics. For example, the profile at $d = 1$ was dependent on the profile at $d = 0$ , as they are identical, except for one feature. The same goes for the profile at $d = 2$ . The profiles at $d = 4$ (and similarly $d = 5$ ) were also completely dependent on that at $d = 0$ , as they have all features flipped. That is, each profile was built to have a predefined role. This structure would be reflected in the investments as autocorrelations. Another source of autocorrelation in the data was that the within-subject investments summed to 100, which implies that the investments have negative correlations.

Using a multivariate model, we could capture these dependencies. We treated the five investments as repeated measurements that were indexed by $(d, w)$ and aligned across subjects. This resulted in a separate response variable per distance category and world. With this model, we could set up an unconstrained covariance structure that could flexibly learn a model of autocorrelations directly from the data.

Due to interaction effects, the combined demographic differences may explain more than computing the sum of the effects produced by each demographic difference separately. For example, the effect of gender difference may be greater if the profile already had three other features different from those of the player, compared with a situation in which all other features matched. The multilevel–multivariate model tells us exactly the effect of gender difference separately for each distance category (i.e., on how many other features the profiles differ from the player). Note that our results concerning risk attitudes (Risk) could have only been revealed by using this type of analysis, as the effects of risk attitudes on the different profile roles are significantly different.

The multilevel–multivariate model we build has 10 response variables $Y_{w d j}$ corresponding to the investment of player $j$ , who was assigned to world $w$ , in the profile at distance $d$ . We hierarchically grouped observations by participants, and the empty model has the form

Y_{w d j} = γ_{00_{w d}} + U_{w d j}

where $U_{w d j}$ is the “between participant” random effect.

Our goal was to operationalize homophily in such a way that the different profiles had different degrees of dissimilarity from the player. We operationalized homophily based on the Hamming distance between the participant’s feature vector and those of the profiles, which counts the number of coordinates in which two feature vectors differ, regardless of the magnitude of the difference for a particular feature. Theoretically, the meaning of each coordinate is conceptualized to be abstract for generality purposes. As a consequence, the interest in the effect on trust produced by a particular instance of a demographic feature, given the magnitude of its difference between two points, is not of primary interest here. In an ideal setting, we would desire features that are all binary and induce an equal effect on trust (i.e., we are only interested in the additive effect of the number of different coordinates between two feature vectors). In this idealized situation, our regression would include only one predictor, namely, Hamming distance.

However, any real-world sociodemographic feature inevitably produces heterogeneous effects on trust (e.g., gender may affect investments more than marital status does), and Hamming distance by itself may not explain all of the variance in the investments. Thus, to take these effects into account, we extended the empty model by including explanatory variables. A dividend of this model is that we can measure the effects and the relative importance of the different demographic features in their impact on mutual trust.

The inclusion of level 1 and 2 explanatory variables, together with their interactions, allows us to estimate their coefficient, $γ_{i w d}$ , where $i$ indexes the independent variables, specifically for each response. This allows us to show how the different explanatory variables affect each of the five profiles (in each of the worlds) separately. In some cases, this was not possible due to the symmetries in the participant’s session. For example, all profiles at $d = 0$ exactly matched the participant’s gender, marital status, and region, and we did not have enough degrees of freedom to estimate the effects of these variables together, specifically for that profile. In these cases, we estimated the coefficient $γ_{i w}$ of one of the matching variables as a joint effect that changed the investments in the five profiles simultaneously (by world).

We fit the model using generalized least squares with an unconstrained covariance structure so that we could capture the negative covariance among the five investments a participant made (21). The model also allowed for heteroscedasticity by estimating different variances for each dependent variable.

In Table S6, we show the estimates of the empty model, the model with all explanatory variables, except the interaction terms, and the full model. Notice that the estimates of the intercepts (i.e., the average investment in a given profile that occupies a given world) are consistent between the empty and the full model, up to estimation errors. For illustration purposes, we show how the coefficients of gender and marital status change between the model without interactions and the full model. In the former, preferences that are not necessarily biases, such as preferences for “female,” “married,” or “older” as indicators of perceived trustworthiness, cancel out the effects of homophily (i.e., participants invest less in the closest profiles and more in the distant one). These effects wash away when we add to the full model the interactions between the characteristics of the participant and those of the profiles.

McFadden’s Choice Model

Another way to analyze our game data is to model the problem as a discrete choice. Instead of examining how much investment was assigned to each of the alternatives, we can think of the profiles as mutually exclusive alternatives and consider the one that received the bulk of the investment made by a participant as the alternative of choice (or the most trusted).

Here, we fit the data using a conditional logit model, consistent with McFadden’s choice model (22). The relevant comparisons for the choice models are the within-subject comparisons, where the trade-offs between characteristics of the alternatives are made. The model estimates the probability that a participant chooses a particular alternative under changes in the attributes of the alternatives.

To operationalize this analysis, we first filtered out from the data individuals who did not have a clear choice, to guarantee that we were analyzing data that reflect mutually exclusive choices. This includes those who did not invest, as well as those whose maximum investments were made in more than one profile. This resulted in a sample with 4,038 participants. We verified that the exclusion of this group did not alter the results we derived from the multilevel–multivariate model. We also checked whether there was any particular feature, demographic or related to their experimental session, that distinguished this group form the rest of the participants, but did not find any significant differences.

We coded the response as follows. The profile that received the bulk of the investment had a response coded as 1. The responses associated with the other profiles were coded as 0. Note that, as we were not considering the magnitude of the investments, we did not need to model the correlations among the investments a participant made. The drawback is that this makes the model somewhat lossy compared with the main model we present in the main text.

Table S8 shows the estimates produced by the model. The qualitative results are consistent with our main findings. Accordingly, the dominant factors that increase the probability of being selected as the most trusted are the log-transformed number of reviews [2.171 (0.13), $P < 0.01$ ] and having a 5-star rating [1.423 (0.17), $P < 0.01$ ]. Consistent with homophily, we also observed a decreasing probability of being the most trusted by distance, even in world 2. Preferences also showed up in the estimates. Females were more likely to be the most trusted [0.36 (0.038), $P < 0.01$ ], as well as married profiles in selections made by single participants [0.435 (0.085), $P < 0.01$ ]. The older the profile was, the more likely it was to be the selected one [0.284 (0.042), $P < 0.01$ ]. Lastly, a phenomenon that became more apparent in the discrete choice model was that profiles from the Northeast and Pacific had greater probability of being selected (compared with those in the South), exhibiting the following effects at $P < 0.01$ : 0.294 (0.06) and 0.257 (0.07), respectively.

Disclaimer

Our collaboration agreement with Airbnb prevents the company from interfering with the publication of the research no matter what the results are.

Acknowledgments

This work was supported by National Science Foundation Grant 1257138 (to B.A., P.P., and K.S.C.).

Footnotes

Conflict of interest statement: A.G. is a data scientist at Airbnb who performed the experiments that rely on the company’s private data. Paolo Parigi began working at Uber after the research design, experiment execution, data analysis, and writing of the study were completed.

Data deposition: The data necessary to replicate our experimental results are available through the Stanford Exchange Project at stanfordexchange.org/project.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1604234114/-/DCSupplemental.

References

1.Stein J. January 29, 2015 Baby, you can drive my car, and do my errands, and rent my stuff. Time. Available at http://time.com/magazine/us/3687285/february-9th-2015-vol-185-no-4-u-s/. Accessed August 9, 2017.
2.Simmel G. Conflict and the Web of Group Affiliation. The Free Press; New York: 1955. [Google Scholar]
3.Lazarsfeld P, Merton R. Friendship as a social process: A substantive and methodological analysis. In: Berger M, Abel T, Page C, editors. Freedom and Control in Modern Society. Van Nostrand; New York: 1954. pp. 18–66. [Google Scholar]
4.McPherson M. An ecology of affiliation. Am Socio Rev. 1983;48:519–532. [Google Scholar]
5.McPherson M, Smith-lovin L, Cook JM. Birds of a feather: Homophily in social networks. Annu Rev Sociol. 2001;27:415–444. [Google Scholar]
6.Blau P. A macrosociological theory of social structure. Am J Sociol. 1977;83:26–54. [Google Scholar]
7.Resnick P, Kuwabara K, Zeckhauser R, Friedman E. Reputation systems. Commun ACM. 2000;43:45–48. [Google Scholar]
8.Earle T. Trust in risk management: A model-based review of empirical research. Risk Anal. 2010;30:541–574. doi: 10.1111/j.1539-6924.2010.01398.x. [DOI] [PubMed] [Google Scholar]
9.Snijders C, Weesie J. 2009. Online programming markets. eTrust: Forming Relationships in the Online World, The Russell Sage Foundation Series on Trust, (Russell Sage, New York), pp 166–186.
10.Houser D, Wooders J. Reputation in auctions: Theory, and evidence from eBay. J Econ Manag Strat. 2006;15:353–369. [Google Scholar]
11.Lin T, Abrahao B, Kleinberg R, Lui J, Chen W. Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research; Beijing: 2014. Combinatorial partial monitoring game with linear feedback and its applications. [Google Scholar]
12.State B, Abrahao B, Cook K. Proceedings of the 4th ACM Int’l Conference on Web Science. Association for Computing Machinery; New York: 2012. From power to status in online exchange. [Google Scholar]
13.State B, Abrahao B, Cook K. 2016. Power imbalance and rating systems. Proceedings of the 10th International AAAI Conference on Web and Social Media, Cologne, Germany (AAAI Press, Palo Alto, CA)
14.Parigi P, State B. Disenchanting the world: The impact of technology on relationships. Social Inform. 2014;1:166–182. [Google Scholar]
15.Cook K, Cooper R. Experimental studies of cooperation, trust, and social exchange. In: Ostrom E, Walker J, editors. Trust and Reciprocity: Interdisciplinary Lessons for Experimental Research. Russell Sage; New York: 2003. pp. 209–244. [Google Scholar]
16.Berg J, Dickhaut J, McCabe K. Trust, reciprocity, and social history. Games Econ Behav. 1995;10:122–142. [Google Scholar]
17.Houser D, Schunk D, Winter J. Distinguishing trust from risk: An anatomy of the investment game. J Econ Behav Organ. 2010;74:72–81. [Google Scholar]
18.Poundstone W. Prisoner’s Dilemma. Doubleday; New York: 1992. [Google Scholar]
19.Weber E, Blais A, Betz E. A domain-specific risk-attitude scale: Measuring risk perceptions and risk behaviors. J Behav Decis Making. 2002;15:263–290. [Google Scholar]
20.Gneezy U, Potters J. An experiment on risk taking and evaluation periods. Q J Econ. 1997;112:631–645. [Google Scholar]
21.Pinheiro J, Bates DM. Mixed-Effects Models in S and S-PLUS. Springer; New York: 2000. [Google Scholar]
22.McFadden D. Conditional logit analysis of qualitative choice behavior. In: Zarembka P, editor. Frontiers in Econometrics. Academic; New York: 1974. pp. 105–142. [Google Scholar]
23.Edelman B, Luca M. 2005. Digital discrimination: The case of airbnb.com. SSRN. (Harvard Business School, Boston) Harvard Business School Working Paper 14-054.
24.Smith S. Race and trust. Annu Rev Sociol. 2010;36:453–75. [Google Scholar]
25.Fausto-Sterling A. Myths of Gender: Biological Theories About Women and Men. Basic; New York: 1992. [Google Scholar]
26.Zervas G, Proserpio D, Byers J. 2015 A first look at online reputation on Airbnb, where every stay is above average. SSRN. Available at https://ssrn.com/abstract=2554500. Accessed August 9, 2017.
27.Santana J, Parigi P. Risk aversion and engagement in the sharing economy. Games. 2015;6:560–573. [Google Scholar]

[r1] 1.Stein J. January 29, 2015 Baby, you can drive my car, and do my errands, and rent my stuff. Time. Available at http://time.com/magazine/us/3687285/february-9th-2015-vol-185-no-4-u-s/. Accessed August 9, 2017.

[r2] 2.Simmel G. Conflict and the Web of Group Affiliation. The Free Press; New York: 1955. [Google Scholar]

[r3] 3.Lazarsfeld P, Merton R. Friendship as a social process: A substantive and methodological analysis. In: Berger M, Abel T, Page C, editors. Freedom and Control in Modern Society. Van Nostrand; New York: 1954. pp. 18–66. [Google Scholar]

[r4] 4.McPherson M. An ecology of affiliation. Am Socio Rev. 1983;48:519–532. [Google Scholar]

[r5] 5.McPherson M, Smith-lovin L, Cook JM. Birds of a feather: Homophily in social networks. Annu Rev Sociol. 2001;27:415–444. [Google Scholar]

[r6] 6.Blau P. A macrosociological theory of social structure. Am J Sociol. 1977;83:26–54. [Google Scholar]

[r7] 7.Resnick P, Kuwabara K, Zeckhauser R, Friedman E. Reputation systems. Commun ACM. 2000;43:45–48. [Google Scholar]

[r8] 8.Earle T. Trust in risk management: A model-based review of empirical research. Risk Anal. 2010;30:541–574. doi: 10.1111/j.1539-6924.2010.01398.x. [DOI] [PubMed] [Google Scholar]

[r9] 9.Snijders C, Weesie J. 2009. Online programming markets. eTrust: Forming Relationships in the Online World, The Russell Sage Foundation Series on Trust, (Russell Sage, New York), pp 166–186.

[r10] 10.Houser D, Wooders J. Reputation in auctions: Theory, and evidence from eBay. J Econ Manag Strat. 2006;15:353–369. [Google Scholar]

[r11] 11.Lin T, Abrahao B, Kleinberg R, Lui J, Chen W. Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research; Beijing: 2014. Combinatorial partial monitoring game with linear feedback and its applications. [Google Scholar]

[r12] 12.State B, Abrahao B, Cook K. Proceedings of the 4th ACM Int’l Conference on Web Science. Association for Computing Machinery; New York: 2012. From power to status in online exchange. [Google Scholar]

[r13] 13.State B, Abrahao B, Cook K. 2016. Power imbalance and rating systems. Proceedings of the 10th International AAAI Conference on Web and Social Media, Cologne, Germany (AAAI Press, Palo Alto, CA)

[r14] 14.Parigi P, State B. Disenchanting the world: The impact of technology on relationships. Social Inform. 2014;1:166–182. [Google Scholar]

[r15] 15.Cook K, Cooper R. Experimental studies of cooperation, trust, and social exchange. In: Ostrom E, Walker J, editors. Trust and Reciprocity: Interdisciplinary Lessons for Experimental Research. Russell Sage; New York: 2003. pp. 209–244. [Google Scholar]

[r16] 16.Berg J, Dickhaut J, McCabe K. Trust, reciprocity, and social history. Games Econ Behav. 1995;10:122–142. [Google Scholar]

[r17] 17.Houser D, Schunk D, Winter J. Distinguishing trust from risk: An anatomy of the investment game. J Econ Behav Organ. 2010;74:72–81. [Google Scholar]

[r18] 18.Poundstone W. Prisoner’s Dilemma. Doubleday; New York: 1992. [Google Scholar]

[r19] 19.Weber E, Blais A, Betz E. A domain-specific risk-attitude scale: Measuring risk perceptions and risk behaviors. J Behav Decis Making. 2002;15:263–290. [Google Scholar]

[r20] 20.Gneezy U, Potters J. An experiment on risk taking and evaluation periods. Q J Econ. 1997;112:631–645. [Google Scholar]

[r21] 21.Pinheiro J, Bates DM. Mixed-Effects Models in S and S-PLUS. Springer; New York: 2000. [Google Scholar]

[r22] 22.McFadden D. Conditional logit analysis of qualitative choice behavior. In: Zarembka P, editor. Frontiers in Econometrics. Academic; New York: 1974. pp. 105–142. [Google Scholar]

[r23] 23.Edelman B, Luca M. 2005. Digital discrimination: The case of airbnb.com. SSRN. (Harvard Business School, Boston) Harvard Business School Working Paper 14-054.

[r24] 24.Smith S. Race and trust. Annu Rev Sociol. 2010;36:453–75. [Google Scholar]

[r25] 25.Fausto-Sterling A. Myths of Gender: Biological Theories About Women and Men. Basic; New York: 1992. [Google Scholar]

[r26] 26.Zervas G, Proserpio D, Byers J. 2015 A first look at online reputation on Airbnb, where every stay is above average. SSRN. Available at https://ssrn.com/abstract=2554500. Accessed August 9, 2017.

[r27] 27.Santana J, Parigi P. Risk aversion and engagement in the sharing economy. Games. 2015;6:560–573. [Google Scholar]

PERMALINK

Reputation offsets trust judgments based on social biases among Airbnb users

Bruno Abrahao

Paolo Parigi

Alok Gupta

Karen S Cook

Significance

Abstract

Experimental Design

Table S1.

Table S2.

Table S4.

Fig. S1.

Fig. S2.

Table S5.

Multilevel–Multivariate Analysis

Fig. 1.

Table S6.

Table S7.

Table S8.

Fig. 2.

Results

Homophily Is at Work.

Trust via Reputation.

Risk.

Real-World Data Analysis.

Fig. 3.

Discussion

Materials and Methods

Game Design Details

How We Computed the Game Winners

Details on the Population Sample

Self-Selection Analysis

Table S3.

Risk Assessment Question

Multilevel–Multivariate Model

McFadden’s Choice Model

Disclaimer

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases