Conducting online virtual environment experiments with uncompensated, unsupervised samples

Bernd Huber; Krzysztof Z Gajos

doi:10.1371/journal.pone.0227629

. 2020 Jan 30;15(1):e0227629. doi: 10.1371/journal.pone.0227629

Conducting online virtual environment experiments with uncompensated, unsupervised samples

Bernd Huber ¹, Krzysztof Z Gajos ^1,^*

Editor: Victoria Manning²

PMCID: PMC6992162 PMID: 31999696

Abstract

Web-based experimentation with uncompensated and unsupervised samples allows for a larger and more diverse sample population, more generalizable results, and faster theory to experiment cycle. Given that participants are unsupervised, it is still unknown whether the data collected in such settings would be of sufficiently high quality to support robust conclusions. Therefore, we investigated the feasibility of conducting such experiments online using virtual environment technologies. We conducted a conceptual replication of two prior experiments that have been conducted in virtual environments. Our results replicate findings previously obtained in conventional laboratory settings. These results hold across different device types of participants (ranging from desktop, through mobile devices to immersive virtual reality headsets), suggesting that experiments can be conducted online with uncompensated samples in virtual environments.

Introduction

With more and more human subjects research being conducted through online experiments and crowd work, a novel research direction is to investigate uncompensated samples as a way to conduct large scale studies with the benefit of being cheaper and better representative populations [1, 2]. Previous studies have shown that conducting large scale online experiments with unpaid volunteers has little effect on data quality, while providing potentially much more ecologically valid data compared to paid alternatives [3].

Virtual environment technology holds many promises, among which is the potential to enable new methods for conducting social and psychological experiments [4]. With wider adoption of devices and developer platforms supporting virtual environment technologies, such as WebVR [5] or Google Cardboard [6], there is new promise in the kind of studies and experiments researchers can do in order to get insights that may not have been possible to get before. Despite the promise and scale of adoption of such technology, most experiments in virtual environments, with a few notable exceptions (e.g., [7–9]), are conducted in physical lab spaces or paid settings.

In this work, we study voluntary, unpaid participants of online experiments in virtual environments. Such experiments are designed to be intrinsically motivating (users can often learn about themselves) [10]. We replicate two previously studied phenomena. The first study investigates people’s navigation abilities by letting participants escape virtual mazes [11]. In this study, male participants solved a maze significantly faster compared to female participants. The second study investigates the Proteus Effect, which predicts that people’s confidence changes when their perceived appearance is being manipulated. Specifically, one of the findings of this study is that perceived participant height influences confidence as expressed by negotiation behavior in the ultimatum game [12, 13].

We designed and deployed two experiments for LabintheWild.org [3], an online experiment platform, based on those two phenomena. Since both original studies were conducted in compensated settings, and we studied participants in an uncompensated setting, we recruited users by offering them feedback about their navigation style, and their negotiation skills, respectively. Our experiments suggest that both experiment approaches can be used to study human behavior at similar quality levels that supervised and compensated settings provide, suggesting the value of studies in virtual environments. In summary, this work contributes the following:

Replication of virtual environment studies in uncompensated online settings with the redesign of incentive structures for such settings.
Demonstration of feasibility of embodiment and place illusion as manipulation mechanisms for such experimentation settings.
Study of the effect of device type in uncompensated settings.

Background

In our work, we leverage the advantages of experiments with uncompensated samples, as well as behavioral studies in virtual environments. The following section provides an overview of both of those areas, as well as the conceptual background of our work.

Uncompensated online behavioral experiments

Online experiments have become an acceptable tool in behavioral research [14]. There are many potential benefits for conducting experiments using uncompensated online studies. According to Reinecke and Gajos [3], benefits include (1) subject pool diversity in terms of age, ethnicity, and socioeconomic status; (2) very low cost to running studies; (3) fast theory/experiment cycle; (4) relative stability of the subject pool over time. People on such experiment platforms typically arrive at the experiments through various sources such as referral, news articles, or social media. A major additional advantage is on the potential scale of distribution, depending on the study setup. Together with the negligible costs, online experimentation with uncompensated samples allows for studies with larger numbers of participants with very long duration. The larger scale and diversity provides an opportunity to apply more granular treatments.

Despite the advantages of running studies “in the wild”, there are various challenges to conducting studies out of the lab. These include reliability of data gathering, as well as ensuring the control of conditions. Previous work studies the effect of bringing lab-based findings online, both in compensated and uncompensated settings. [3, 15] The results suggest that while incentive structures may differ between compensated settings and uncompensated settings, collected data does not necessarily suffer in uncompensated settings [3]. Additionally, mechanisms such as a survey question that simply asks participants whether participants cheated while taking a test, has been shown to capture a wide range of noise. In another study, Komarov et al. [15] compare supervised lab-based user studies with unsupervised online studies on Amazon Mechanical Turk, finding that unsupervised settings lead to similar results. Our work extends these findings to experiments in virtual environments.

Behavioral studies in virtual environments

Virtual reality (VR) has become an important tool for studying behavioral and cognitive processes since Blascovich et al.’s call for using VR as a research tool in 2002 [4]. This section provides a short overview of some areas of experimental work.

Large-scale

Internet-based VR studies have recently become more popular. Gehlbach et al. replicated in 2015 an earlier study on perspective taking conducted in VR using Amazon Mechanical Turk (AMT) with a desktop VR [9]. In 2016, Oh et al. proposed the concept of Immersion at Scale, testing out collecting data on mobile VR devices outside of the lab by setting up physical tents at different locations (e.g., at local events, museums) [7]. Researchers also conducted the first ethnographic study in VR with remote participants [16]. More recently, researchers investigated paid crowdsourced VR experiments [8], in which three studies were replicated and the feasibility of using head-mounted VR in crowdsourcing settings was shown. Furthermore, Mottelson et al. [17] find that results quality holds when moving lab-based VR studies to outside-the-lab VR settings, with the complexity of the studied phenomenon governing this effect. Steed et al. replicate test the effect of presence and embodiment in VR “in the wild” by replicating lab studies [18]. We extend previous work on VR studies, by moving towards uncompensated online settings, and including non-head mounted devices in our analyses.

Enhancing mediated experiences

An immersive experience can be described as one in which a person is enveloped in a feeling of isolation from the real world [19]. For example, games in three-dimensional environments and with high degrees of interaction often make gamers feel immersed with the virtual environment [19]. A related aspect of a virtual experience is presence: the extent to which a person’s cognitive and perceptual systems are tricked into believing they are somewhere other than their physical location [20]. The concept of presence is a frequently emphasized factor in immersive mediated environments. Previous research often assumes that greater levels of immersion elicit higher levels of presence, in turn enhancing the effectiveness of a mediated experience. Cummings et al study the effect of levels of immersion on presence, drawing from multiple studies conducted in VR [21]. Their findings suggest that immersion technologies such as stereoscopic visuals, wider fields of view and increased user tracking have medium sized effects on presence, while other technological factors such as visual content quality have less of an effect on presence. The authors also speculate that these effects may change, as technology becomes more and more adopted.

Models of illusion

Gonzales-Franco and Lanier argue that VR is capable of delivering primarily three types of illusions: place illusion, embodiment illusion, and plausibility illusion [22]. Place illusion refers to a user’s feeling of being transported into a rendered environment. Embodiment illusion refers to a user’s feeling of experiencing the virtual world through an avatar. Together, place and embodiment illusions enhance the plausibility illusion, which refers to the feeling that events happening in the virtual world are real. In general, researchers have been leveraging all three types of illusions in their studies to deliver different experimental manipulations. The value of the illusions is that it allows researchers to study phenomena that would have been hard to manipulate or study outside of VR since it is often much harder to manipulate some of these factors in the real world. In this work, we study place and embodiment illusions as mechanisms for such experiments. The following sections describe these factors at greater detail.

Place illusion builds on real-world human behavior where environments interact with how humans behave in a situation. For example, Maani et al. showed that immersion in cooling virtual environments during surgical procedures can reduce perceived pain levels [23]. In another example, researchers put participants in a virtual forest, replicating the behavior of hikers in a previously studied real-world experiment [24].

Embodiment illusion is the experience of a virtual world through a virtual self-representation, often referred to as an avatar. Many studies using VR technologies have demonstrated the influences of embodied experiences on behavior. Research has shown that it is possible in VR to generate perceptual illusions of ownership over a virtual body seen from a first-person perspective, and learn to control the virtual body even when the body appears different from the user’s real body. In addition, different avatar designs have been shown to affect perceived levels of presence and other behaviors. An often studied phenomenon in virtual environments regarding the embodiment illusion is the Proteus effect [13, 25]. The Proteus effect refers to the phenomenon that characteristics of a user’s avatar influence the user’s behavior in a virtual environment. Yee and Bailenson showed for example that participants assigned more attractive avatars behaved more intimately with confederates in self-disclosure and interpersonal distance tasks, and participants assigned taller avatars behaved more confidently in a negotiation task [13, 25]. Additional support of the Proteus effect was exhibited in a study in which the embodiment of sexualized avatars elicited higher reports of self-objectification [26]. While other follow-up studies found opposite effects of less attractive appearance leading to more positive behaviors [25], previous literature agrees on the fact that appearance-related attributes of self-images lead to changes in participant behavior.

Overview

Given the advantages of online experimentation with uncompensated samples and the promising developments of technology supporting virtual environments, we pose the research question of whether experiments in such settings are feasible. We considered virtual environments in both immersive settings such as HMD, as well as non-immersive environments such as Desktop VR [23]. Specifically, we pose the following questions:

RQ1 Is it feasible to study the effect of gender on spatial abilities in virtual environment online experiments with uncompensated samples?
RQ2 Is it feasible to study the effect of appearance on behavior in virtual environment online experiments with uncompensated samples?
RQ3 How do these effects vary between different devices?
RQ4 What is the population and knowledge of VR online experimentation participants?

Study 1: Spatial navigation task

In the first study, we examined people’s navigation abilities with the place illusion. Previous research had shown that in navigation settings without points of interest, gender affects people’s abilities to navigate [11]. In the original study, researchers let participants escape virtual mazes without showing them any map or overview. Participants were using a virtual environment at a desktop computer in a supervised laboratory setting. Gender differences were significant on maze completion times and errors in the maze made by participants. Furthermore, differences in errors by gender varied between trial number: while a significant main effect of gender on error rate was found, no significant effect of gender was found when only looking at the first trial of a maze. The authors conclude from this observation that men and women learn differently, with men learning faster than women. We modeled our first study after this experiment.

Methods

This work was approved by the Harvard University-Area Committee on the Use of Human Subjects (IRB Registration—IRB00000109; Federal Wide Assurance—FWA00004837), with Protocol#: IRB17-1989. Participants were presented with an informed consent form at the beginning of the study.

Tasks and procedures

Following the experimental procedure in [11], participants were given multiple trials to escape a maze, without any other information about the map, their current location in the maze or landmarks. Participants were prompted to remember their way out and to complete the task as quickly and accurately as possible. The exit of a maze is marked with a red box. Participants were asked to walk toward the box to finish the maze. See Fig 1 for the mazes we used and the look of the virtual maze environment.

Fig 1 — (A, B) Participants’ view within the maze, and the view of the goal (red box). (C,D) The layout of the two mazes. The left, simple maze has one decision point on its path to the goal, the difficult maze has three decision points on its path to the goal. Note that this layout was never shown to participants.

We adapted the original study in several ways. Running the study online in unsupervised settings and without compensation required us to keep the maze experience rather short. Instead of five trials per maze as in the original paper [11], we used three trials per maze. Furthermore, in the original study, multiple decision points were put into every maze. We added one and three decision points per maze, respectively.

We used camera tilt to control forward and backward motion, and the viewer’s orientation to control the moving direction. On a mobile device, the camera tilt can be controlled by the user with the angle of the device. On a desktop device, the camera tilt can be controlled by dragging the camera view with the mouse pointer. We designed the test this way to have the least keyboard interactions necessary, and to have consistent controls across devices.

After giving their informed consent, participants were asked to fill out a demographics questionnaire. They then received instructions about the experiment, and we asked them about the current device they used (Head-mounted, mobile or desktop). Users could then test the motion controls in a tutorial environment without maze. As a next step, participants entered the first (easier) maze and were prompted to find the exit. Participants had three trials for this first maze to optimize their time needed to escape the maze. Participants then entered the second maze (difficult), with the same task to optimize their escape time. Instructions were presented in English.

Designing for uncompensated samples

In the original procedure, financial rewards were given to the study participants. In our case, we needed to design the study such that it attracts intrinsically motivated participants, providing non-monetary value. Participant motivation is also important for recruitment in uncompensated settings [27].

In our case, we provided people an assessment of their navigation style, since the original paper [11] hypothesizes that gender differences in the scores of this navigation test come from the fact that female participants have different navigation styles more centered around landmarks. When participants performed below average, their navigation style was classified as landmark-centered. When participants performed above average, their navigation style was classified as view-centered. See Fig 2 for screenshots of the recruitment and results pages.

Fig 2 — (A) The recruitment page of the navigation study. Participants are prompted to participate in order to learn about their navigation style. (B) The results page of the navigation study. Participants are assessed based on their traveled distance and the time they needed. Participants are also provided additional material explaining the test results in greater detail.

Participants

311 participants completed all six maze trials. The participants (51% female) were from 52 different countries. Participants were between 12 and 71 years old (mean = 31, sd = 11 years). Participants were also asked how often they used computers, and how often they used HMD devices.

Design and analysis

To investigate the possibility of gender differences in time-to-completion and error rates, a repeated measures analysis of variance (ANOVA) was performed with gender as a between-subjects factor and maze difficulty, computer usage (how often the participant uses a computer) and device type (Desktop, Mobile, HMD) as control variables, as well as trial number as a repeated measure. We furthermore included an interaction effect between gender and trial number to measure differences in learning rates between men and women.

Completion times were computed in seconds as the time from when the participants entered the maze, to the moment when they escaped. Error counts were computed as the number of times when participants chose a wrong turn at a decision point. Specifically, every time a participant passed a decision point and chose a wrong turn, the error count was increased by 1, which we adopted from the original study.

Results

Fig 3 shows the mean times and error rates of male and female subjects to complete the three maze trials for the two different mazes. The overall average escape times were 62 seconds for the simple maze and 78 seconds for the difficult maze, in the last rounds for each maze (see Table 1 for detailed result statistics). Participants traveled 1.2 and 1.4 times the optimal route (averaged over all trials), for the simple and difficult maze respectively.

Table 1. Mean and standard deviations (in brackets) of error rate and completion times separated by gender and maze difficulty.

Gender	Maze Difficulty	Number of Errors	Completion time (seconds)
Male	Maze Easy	0.06 (0.16)	30 (18)
Male	Maze Difficult	1.06 (0.15)	55 (27)
Female	Maze Easy	0.21 (0.28)	48 (28)
Female	Maze Difficult	1.06 (0.15)	81 (34)

Open in a new tab

The results of the ANOVA analysis did show a statistically significant main effect of gender on completion time (F(1, 518) = 63.2, p =< .0001) when controlled for maze difficulty, computer usage of participants, and device type, with male participants solving the maze significantly faster compared to female participants. As well, there were significant main effects of trial number (F(2, 517) = 5.0, p = 0.0073), with participants getting faster at solving the maze by doing it multiple times (see Fig 3). In contrast to the original study, our data show a significant interaction effect between the trial number and gender (F(2, 517) = 9.2, p = .0020), with female participants improving their completion times faster (6% average improvement rate between trials) than males (3% average improvement rate between trials). No significant effect of device type (HMD, mobile, desktop) on performance was found (F(2, 517) = 3.72, p = 0.066). In addition, our data did not show an interaction effect between device and gender.

For error rates, our ANOVA with the same control variables showed a statistically significant main effect of gender (F(1, 518) = 35.2, p < .0001), with male participants committing significantly fewer errors compared to female participants. Furthermore, there were significant main effects of trial number (F(2, 517) = 7.71, p = .0005), and device type F(2, 517) = 12.3, p = 0.0005, with participants on desktop committing significantly fewer errors. Our data also shows a significant interaction effect between trial number and gender (F(2, 517) = 6.1, p = .0023), with female participants reducing their error rates faster (6% average improvement rate between trials) than males (0.5% average improvement rate between trials). Table 2 summarizes our results and compares them with the results in the original study.

Table 2. Summary of our statistical analyses for the maze study, compared to the analysis results reported in the original study.

		Original Study			Ours
		F	df	p	F	df	p
Completion Time	Gender	20.3	1,71	<.001	63.2	1, 518	<.0001
Completion Time	Trial Number	14.74	4,288	<.001	5.0	2, 517	.0073
Error Rates	Gender	17.41	1,71	<.001	35.2	1, 518	<.0001
Error Rates	Trial Number	13.81	4,288	<.001	7.71	2, 517	.0005

Open in a new tab

Discussion

We replicated results from the original study for both completion time between gender, as well as error rates. Main effects of gender on maze completion times that we observed are aligned with previous work, suggesting a gender difference on spatial navigation tasks without landmarks. In Fig 4, the path heatmap, as divided by gender, shows that decision points were the most important struggle causing this difference. Furthermore, based on our data analyses, participants were able to learn to navigate the mazes since their completion times became lower with an increasing number of trials per maze. This seems to be also reflected in the heatmap of participants’ walking paths. Our data furthermore reveals that especially as the navigation task gets more difficult (more decision points, more turn points), female participants more likely learn the optimal paths with increasing trials.

Fig 4 — The heatmaps visualize participants improvement in escaping the mazes when trying multiple times. Furthermore, the heatmaps show gender differences in walking paths.

One conclusion from the original study was that the difference in error rates between male and female participants can be explained by separating information errors from spatial memory errors. While in the first trial, errors occur due to a lack of knowledge of the correct route (information error), consecutive errors are more likely made due to spatial memory errors. The original study finds an increase in the gap of error rates between male and female participants over the number of trials. Our findings show a different pattern, showing the significant interaction effect between trial number and gender and female participants learning faster than male participants. Our error analysis also revealed that participants often corrected the error at one decision point, while introducing an error at another decision point. It remains an open question what the mechanics are of how learning of the maze impacts performance.

The replication of the results from a lab study in uncompensated online settings shows that it is possible to deliver place illusion in such settings, with the main effect of gender on performance remaining intact. That we were able to replicate the experiment results from laboratory settings and across multiple device categories (RQ3) suggests that such experiments can be used to study people’s navigation behavior in uncompensated settings (RQ1). Even in such settings as uncompensated online experiments, we were able to replicate observations about people’s navigation abilities. While our sample size is still relatively small compared to non-VR experiments in uncompensated online experiment settings, this experiment shows the potential of using virtual environments to conduct immersive studies online without compensation.

Study 2: Negotiation task

In our second study, we examined the Proteus effect. The Proteus effect refers to the phenomenon that individual’s behavior conforms to their digital self-representation independent of how others perceive them. This effect is an often studied topic in VR [13, 25]. The original paper that coined the term showed in one of the experiments that participants assigned taller avatars behaved more confidently in a negotiation task than participants assigned shorter avatars. In the original study [13], the authors tested the effect of appearance in a virtual environment on confidence in the negotiation behavior of 50 undergraduate university students, letting them negotiate in a lab-based virtual environment. In the study, height was manipulated relative to the confederate, allowing participants to infer their own height. Negotiation differences were significant between different heights that were randomly assigned to participants. The original study also looked at other appearance-related manipulations such as facial look. Participants were using an HMD in a supervised laboratory setting. In our replication study, we asked whether the same overall methodology can be effective with less immersive (but more pervasive) devices and with unsupervised online participants.