Design Factors of Longitudinal Smartphone-based Health Surveys

Sudip Vhaduri; Christian Poellabauer

doi:10.1007/s41666-017-0003-8

. 2017 May 16;1(1):52–91. doi: 10.1007/s41666-017-0003-8

Design Factors of Longitudinal Smartphone-based Health Surveys

Sudip Vhaduri ^1,^✉, Christian Poellabauer ¹

PMCID: PMC8981805 PMID: 35415393

Abstract

Phone-based surveys are increasingly being used in healthcare settings to collect data from potentially large numbers of subjects, e.g., to evaluate their levels of satisfaction with medical providers, to study behaviors and trends of specific populations, and to track their health and wellness. Often, subjects respond to such surveys once, but it has become increasingly important to capture their responses multiple times over an extended period to accurately and quickly detect and track changes. With the help of smartphones, it is now possible to automate such longitudinal data collections, e.g., push notifications can be used to alert a subject whenever a new survey is available. This paper investigates various design factors of a longitudinal smartphone-based health survey data collection that contribute to user compliance and quality of collected data. This work presents the design recommendations based on analysis of data collected from 17 subjects over a 1-month period.

Keywords: Data collection, Phone surveys, Response ratio, Response delay, Completion ratio, Completion time, Experience sampling, Electronic diary, Health and wellness

Introduction

User surveys have been used as a primary tool for data collection in various user studies on topics such as addictive behavior [32, 48, 49], pain [52], health [3, 39, 45], well-being [25, 61], customer satisfaction [14], and system usability analysis [19]. For example, in the health and well-being area, researchers rely on surveys to check how factors such as mood [62], social interactions, sleeping habits, levels of physical activity [61], life satisfaction levels [9], and spirituality [37, 55] affect the health and well-being of an individual or an entire community. Researchers rely on two primary types of surveys: questionnaires and interviews [50]. Typically, a questionnaire is a paper-and-pencil based method [41, 42, 53], but this does not allow the subjects to ask follow-up questions to elaborate further on the answers. The interview-based approach removes this limitation by allowing follow-up questions in a personal or face-to-face setting between the subject and the study coordinator. However, both the traditional questionnaire and interview approaches suffer from low response rates and high costs [34]. Therefore, in recent years, survey-based research has moved towards web-based surveys [50], where subjects can respond when it is convenient to them.

An increasing number of longitudinal surveys (especially in the health and well-being domain) require frequent responses, sometimes even multiple times a day. This necessitates a study design where survey requests can be “pushed” to the subject at the most appropriate time (e.g., a sleep quality survey in the morning versus a nutrition/exercise survey at the end of the day). Recently, smartphone-based user surveys using experience sampling method (ESM) [5] have increasingly been employed to conduct user studies and collect subjective as well as objective data. This method brings numerous advantages including the ability to monitor the context within which survey responses are provided. These contexts include location, motion, proximity to landmarks, environmental conditions, and time of day. These data can be obtained using the phone’s built-in sensors [57, 59] and help compensate for data collection inaccuracies and biases, such as recall bias, memory limitations, and inadequate compliance that comes from self-reports [53]. Smartphones also make it easier to change survey design on-the-fly, e.g., to adapt future survey questions based on previous responses, subject characteristics, and subject preferences. Finally, alert mechanisms and push notifications provided by smartphones also make them excellent tools to inform a subject when a new survey is available or when a survey deadline approaches.

These advanced features of smartphones introduce a new dimension in ESM studies, i.e., a mobile-based experience sampling method or mESM [36]. It has many benefits over other user survey methods, such as observation, retrospective reports/diary, and interview-based user surveys. For example, it does not suffer from observation bias and memory recall bias and error [28, 53]. This method also makes it easier to monitor a large number of subjects over longer periods of time and to capture unexpected events or activities. Mobile-based experience sampling allows us to capture longitudinal user surveys using three ESM approaches: signal-contingent sampling, interval-contingent sampling, and event-contingent sampling [5]. Among these three approaches, signal-contingent sampling typically leads to the lowest memory recall bias, because subjects are able to report their immediate (in situ) experience.

At the same time, the use of smartphones has also resulted in higher subject compliance compared to paper-based surveys and diaries [53]. However, little has been done to investigate the human factors of study design, i.e., there is dearth of knowledge on the impact of various study design parameters on the compliance of subjects and the quality of the collected data. Such design parameters include the timing of survey release, the size of the “response window” (the time frame during which surveys can be answered), the utility of push notifications to alert and remind users of available surveys, the frequency of surveys, and the size of surveys (i.e., the number of questions). As a consequence, we conducted a small scale 1-month long wellness study (Section 3) on 17 college students to collect responses using the WellSense [58, 60] mobile phone survey application. The goal of this study is solely to evaluate the design factors of such a smartphone-based longitudinal survey data collection studies in daily life, not to analyze the actual health and well-being data. In this paper, we present various design recommendations from our observations and analysis (Section 4) of the preliminary data from the study. These recommendations (Section 5) can help to improve compliance and quality of future longitudinal survey-based wellness studies. We time-framed the study around the stressful finals week, i.e., the last week of classes as well as exam periods.

Related Work

To maximize the quality of collected data and the accuracy of the information captured, phone-based user surveys must address similar design challenges as non-phone user surveys [12, 23]. These challenges include “acquiescence response bias” and “straight-lining” [22], “wording”, “question form”, and “contexts” [47]. For example, to increase the reliability and validity of information, item-specific questions can be a better choice over the general agree/disagree scales [43]. Further, it is also recommended that surveys can be designed with either five point or seven point scales depending on whether the hidden construct is unipolar or bipolar [44].

Newer phone-based surveys can take advantage of smartphone technologies, which have become constant companions for their users and provide both data collection opportunities and challenges that cannot be found in non-phone data collections. For example, the mobile device platform, reduced screen size, and non-traditional user interfaces require adherence to mobile-specific design choices and standards [46] and the application of usability principles for smartphone user interfaces [19]. Researchers presented 21 major principles for better usability of mobile phone user interfaces grouped into five classes: cognition, information, interaction, performance, and user [19]. Other efforts in the area of mobile user interface design include enhancing specifically survey interfaces [54]. In another study, the interviewers captured the survey data over phone calls to address various human factors (e.g., the type of the population surveyed) [29].

Little work has been done on enhancing the compliance ratio in ESM [16, 28, 36]. In one study, the authors used SMS messages to send reminders and thereby increase the response rates in ESM [16]. In another study, authors showed that personalization can significantly improve the compliance ratio [28]. Using a one-week study on 36 subjects, they found that the time of day does not have an impact on the compliance ratio of the study population. However, we have observed that for sub-groups of the study population, the time of day does indeed impact their likelihood of responding, which emphasizes the need for sub-group level surveys in addition to personalized surveys. The authors in [28] have raised several questions in their future work section (e.g., the impact of frequency of surveys and the number of questions per survey on survey fatigue), which we aim to address in this work. In [36], the authors performed a survey of mESM approaches, including a discussion of various challenges, such as participant recruitment and incentive mechanisms, sampling time and frequency, contextual bias, data privacy, and sensor scheduling. Some of these challenges are addressed in this paper.

Our work takes advantage of push notifications to inform users when new surveys become available. Previous work has addressed various issues with such notifications and other interruptions [18], including the effects of interruptions during phone calls [4], while the subject is engaged in interactive tasks [17], and multitasking [13]. Recently, researchers have developed automated schemes to deliver notifications depending on phone activity [11], a user’s context, notification content, sender identity [31], and types of device interactions or physical activities [25, 33], typically with the goal to reduce cognitive workload.

Another advantage of using smartphones for data collection is their ability to collect contextual information that may help to interpret human factors, such as intent, emotion, and mobility. In one study, researchers conducted a mobile health study on 48 students, where each user’s mobile device continuously captured sensor data in addition to the survey responses [61]. No push notifications were used to alert users about the availability of surveys and the study showed that overall compliance dropped over the course of the study. In another study, researchers collected various contextual data (such as stress, smoking, drinking, location, transportation mode, and physical activity) using smartphone-based surveys [56]. In addition to the survey data, they also collected various phone sensor data and physiological data (such as heart rate and respiration) from wearables, which can also provide insights into human aspects of data collections. While both studies captured a large variety of personal information, they did not include surveys for overall wellness, including life quality [15], life satisfaction levels [8, 9], and spiritual beliefs [37, 55]. These types of surveys can be essential to obtain a comprehensive view of a subject’s wellness. Since previous studies have often excluded these more sensitive topics, the impact on compliance on these types of surveys has received little attention and is therefore also addressed in this paper.

Finally, Apple’s ResearchKit application development framework [1, 2] indicates that large-scale data collections are of increasing interest, especially in the areas of health and wellness, and tools such as the ResearchKit will make widespread subject recruitment easier. Similar to the efforts described above, surveys can then be enriched with contextual information collected by the phone [38].

Study Design and Data Collection

In this paper, we investigate the human factors in longitudinal smartphone-based surveys and data collections using an analysis of data collected at the University of Notre Dame over a 1-month period (referred to as Wellness Study) in late Spring 2015, where the data collection was timed such that it coincided with the final weeks of classes, final exam week, and the time immediately after finals. The reason for the timing of the study is that we expect many students to experience variations in emotional well-being, stress, and sleep quantity and quality during that time period, which may in turn also impact subject compliance.

System Architecture and WellSense App

The goal of the data collection system used in this work is to provide a mechanism to perform large-scale well-being studies using smartphones, where the main component of the system is the WellSense mobile app. However, WellSense can also simultaneously monitor a variety of contextual information (using phone sensors, resources, and usage patterns) and provide mechanisms for remote data collection management, including the ability to redesign and reconfigure an ongoing study “on-the-fly”. Figure 1 shows the high-level system architecture of WellSense, consisting of a mobile survey and monitoring app, a cloud-based check-in server and database, and a management web portal. Study participants will receive survey requests via the mobile app (Fig. 2) (implemented for both the iOS and Android platforms) and survey responses are transmitted over the network to a check-in server, which is responsible for processing and storing the incoming data in a global database, where each subject or device has a unique identifier and all survey responses are time-stamped. The web portal is the study administrator’s primary tool to manage a study, e.g., to monitor compliance and response rates (via Fetch query in Fig. 1), but also to modify study design, including changes to the survey questions, modifying the timing of survey requests, or the frequency of survey requests (via the Update query in Fig. 1). Administrator can also create new surveys (via the Insert query in Fig. 1). Study participants may also use the web portal to track their own progress and compliance.

Fig. 2 — WellSense survey app. (a) Main Menu of the survey app, (b) Final reminder (i.e., Pop-up) for a *Mood* survey that is going to expire in the next 30 minutes, (c) Button type response layout for a *Sleep* survey question, and (d) Slider type response layout for a *Social Interaction* survey question. The main menu presents the list of surveys and their active periods with bold font for active ones (e.g., *Sleep* survey in (a))

Study participants have full control over which surveys they wish to respond to, i.e., they can skip entire surveys or individual questions of a survey (e.g., when a survey may cause emotional distress or privacy concerns). While a survey is “open”, participants can also revise and resubmit their responses. At the end of a survey, the app informs the participant about the number of questions answered and skipped and at that point the participant can decide to revisit questions or to submit the responses. If the survey fails to upload to the server (e.g., due to a lost network connection), the participant can submit the survey at a later point, without loss of data. Once a survey has been submitted, all responses, their corresponding question identifiers, and timestamps, along with a survey identifier and user identifier are stored on the cloud server. During our study data collection we used the Parse (www.parse.com) server, primarily due the ease of integration of the server with mobile applications. However, any such on-line servers with push notification capability can be used as a back-end sever. Each survey response is stored as a new row in the database, together with the identifiers described above. Given survey ID, question ID, user ID, and timestamps, it is easy for a study administrator to monitor compliance or to detect patterns that may indicate difficulties in the study, such as poor response rates for a specific survey or question, which can be due to the content or the timing of the survey or question. The back-end server’s push notifications allow a study administrator to push alerts to one or more participants, e.g., when their compliance is low.

Wellness Study Design

Using the WellSense app, we designed a 1-month data collection effort that consisted of various components described below. Participants’ demographic information, various deadlines and class schedules, and health issues and concerns (e.g., pre-existing conditions) were collected through an on-line Resource Assessment survey. We also collected their health and well-being baseline information in terms of current health, fitness (PHQ-9) [21], perceived stress (PSS) [7], perceived success and satisfaction [10], loneliness and other social concerns [40], and sleep quality [6] using on-line Pre-study and Post-study surveys.

WellSense Configuration

The purpose of the smartphone-based surveys is to assess the well-being of a subject based on various contexts and activities during different parts of the day. We use multiple types of surveys and relevant question types in our implementation to capture various contexts and activities that affect personal health and well-being, shown in Table 1. Table 2 shows the timing information for the different surveys with ESM using the signal-contingent sampling approach due to its low recall bias compared to other approaches [5]. Apart from Life and Spirituality, all other surveys are answered daily. In Table 1, “M,” “Su,” and “S” represent Monday, Sunday, and Saturday, respectively. In our study, subjects respond to the following surveys:

Mood surveys: This looks for positive and negative factors impacting mood, as well as stress and fatigue levels. We consider 10 positive items (e.g., peaceful?, inspired?) and 10 negative items (e.g., angry?, upset?) along with 3 fatigue items (tired?, sleepy?, drowsy?) and 1 stress item (stressed/overwhelmed?) [62] with item-specific ranking as response options. In the following sections, we use M1, M2, and M3 for the Mood surveys in the morning, afternoon, and evening, respectively.
Social Interaction survey: This looks for a person’s social engagement and its impact on health and well-being [61]. We use five basic questions about interaction type, duration, involved parties, etc., along with nine 7-point, bipolar rating scales, where 1 and 7 represent the two extreme points, i.e., “Not at all” and “To a great extent,” respectively. Table 3 shows a subset of the entire survey. In the following sections, we use SI for the Social Interaction survey.
Sleep survey: This evaluates the sleep quality of the previous night with three questions about duration, quality, and difficulty staying awake during the day. Table 4 shows the entire survey.
Life survey and Spirituality survey: This looks for a subject’s overall life status, satisfaction [9], and spiritual belief [55] and to see their impact on health and well-being. Table 5 shows sample questions of these surveys.

Table 1.

Survey question types and question counts

Survey category	Question type and scale	Question count
Mood [62]	Rank ordering	24
Sleep [61]	Slider and Likert Interval	3
Social	Dichotomous, multiple choice,	14
Interaction [61]	bipolar semantic differential
Life [9]	Unipolar rating	7
Spirituality [55]	Unipolar rating	5

Survey category	Day(s)	Time(s)	Active period
Mood	M–Su	10 a.m.	2
		2 p.m.
		6.30 p.m.
Sleep	M–Su	8 a.m.	3
Social Interaction	M–Su	9 p.m.	2
Life	S	12 p.m.	12
Spirituality	Su	6 p.m.	4

Survey questions	Response options
When was your last	0–10, 11–45, 45 +
social interaction?	minutes ago
Length of that	< 1, 1–10, 10–20,
interaction?	20–45, 45 + minutes
Interaction Type	In person, Phone call, Voice or
	Video chat, Text chat, E-mail
How many people	1, 2, 3, 4 or more
were involved?
Interacting with	Significant other, Relative(s),
whom?	Friend(s), mentor, other
You helped someone?	1 = Not at all, 2, 3, 4,
Someone treated you badly?	5, 6, 7 = To a great extent

Survey questions	Response options
How many hours did	< 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5,
you sleep last night?	8, 8.5, 9, 9.5, 10, 10.5, 11, 11.5, 12
Rate your overall sleep	Very bad, Fairly bad,
last night	Fairly good, Very good
How often did you have trouble	None, Once, Twice,
staying awake yesterday	Three or more times
(e.g., dozing off)?

Survey category	Sample questions
Life	How satisfied or dissatisfied are you with your life?
	Do you experience meaning and purpose in what
	you are doing?
Spirituality	How do you feel the presence of God or another
	spiritual essence?
	Do you find strength in your spirituality?

Survey	Study period
Survey	Week#1	Week#2	Week#3	Week#4
Life	0.75	0.83	0.73	0.71
Spirituality	0.69	0.36	0.36	0

Survey Category	Saturday	Survey days Sunday	Monday-Friday
M1	0.20 (1)	0.34 (1)	0.25 (1)
M2	0.20 (0)	0.32 (2)	0.28 (2)
M3	0.19 (0)	0.21 (0)	0.24 (2)
Overall
(Mood)	0.20 (NA)	0.29 (NA)	0.26 (NA)
SI	0.30 (0)	0.42 (1)	0.33 (2)
Sleep	0.22 (2)	0.26 (2)	0.33 (2)
Overall	0.22 (NA)	0.31 (NA)	0.29 (NA)

Survey period
Group	Morning	Afternoon	Evening
All	0.26 (80/313)	0.28 (88/313)	0.24 (76/313)
SG#1	0.64 (51/80)	0.36 (26/72)	0.39 (28/72)
SG#2	0.06 (5/88)	0.22 (19/88)	0.15 (13/88)

Source	SS	df	MS	F	Prob > F
Groups	1.1105	3	0.3702	5.0824	0.0019
Error	23.5971	324	0.0728
Total	24.7076	327

Survey category	Saturday	Survey days Sunday	Monday–Friday	All days
M1	31.33 (26.12)	49.77 (45.23)	33.29 (32.21)	35.75 (33.69)
M2	42.09 (39.77)	36.10 (35.85)	44.53 (36.22)	46.10 (37.14)
M3	41.00 (28.78)	31.89 (29.20)	33.67 (31.05)	34.89 (29.88)
Overall
(Mood)	37.85 (31.42)	40.47 (38.07)	37.59 (33.66)
SI	38.25 (36.35)	35.63 (32.39)	39.91 (35.66)	39.15 (34.78)
Overall
(Mood, SI)	37.98 (32.73)	38.85 (35.99)	38.31 (34.24)
Sleep	104.33 (63.67)	104.67 (37.39)	93.86 (48.71)	95.49 (50.53)

Source	SS	df	MS	F	Prob > F
Groups	7.1922e+03	2	3.5961e+03	3.1281	0.0454
Error	3.0349e+05	264	1.1496e+03
Total	3.1069e+05	266

Source	SS	df	MS	Chi-sq	Prob > Chi-sq
Groups	110.8	2	55.4000	5.5599	0.062
Error	168.2	12	14.0167
Total	279	14

Survey category	Class	Survey days Reading	Exam	Post-exam
M1	28.50 (31.68)	32.93 (31.55)	26.86 (24.75)	42.29 (38.65)
M2	38.35 (37.33)	34.06 (34.04)	48.13 (38.07)	48.61 (36.27)
M3	38.47 (37.51)	32.63 (34.44)	29.33 (20.79)	37.30 (29.97)
Overall
(Mood)	37.02 (36.03)	33.24 (32.74)	33.37 (28.56)	43.43 (35.85)
SI	28.32 (25.90)	27.05 (25.65)	27.96 (28.01)	52.54 (39.49)
Overall
(Mood, SI)	34.35 (33.29)	31.51 (30.86)	31.77 (28.33)	46.57 (37.27)
Sleep	83.20 (45.99)	112.75 (42.19)	89.81 (53.01)	97.09 (50.72)

Source	SS	df	MS	F	Prob > F
Groups	1.7338e+04	3	5.7791e+03	5.2326	0.0021
Error	1.2149e+05	110	1.1044e+03
Total	1.3883e+05	113

	M1	M2	M3	SI	Sleep	Life	Spirituality
CT	48.7	51.6	51.5	36.9	8.1	19.7	20.8
RD	35.8	46.1	34.9	39.2	95.5	259.1	96.2

Survey category	Saturday	Survey days Sunday	Monday–Friday	p-value
M1	51.29	48.50	48.30	0.7344
M2	53.88	44	52.47	0.2909
M3	42.63	45.89	53.92	0.1395
SI	34.85	36.15	37.39	0.8181
Sleep	7	8.10	8.19	0.4622

Survey category	Class	Survey days Reading	Exam	Post-exam	p-value
M1	53.20	53.31	45.75	48.11	0.1987
M2	54.78	52.57	52.13	49.09	0.6845
M3	53.07	48.50	50.33	53.45	0.8300
SI	38.39	41.29	37.50	34.76	0.4442
Sleep	9.21	7.12	7.76	8.23	0.2434

PERMALINK

Design Factors of Longitudinal Smartphone-based Health Surveys

Sudip Vhaduri

Christian Poellabauer

Abstract

Introduction

Related Work

Study Design and Data Collection

System Architecture and WellSense App

Fig. 1.

Fig. 2.

Wellness Study Design

WellSense Configuration

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Wellness Study Data Collection

Fig. 3.

Observations and Findings

Fig. 4.

Effect of Question Count on Survey Response Ratios

Effect of Survey Category on Survey Response Ratios

Table 6.

Effect of Survey Frequency on Survey Response Ratios

Fig. 5.

Effect of Active Period on Survey Response Ratios

Fig. 6.

Fig. 7.

Table 7.

Correlation of Non-overlapping Active Period and RR

RR Variation of Overlapping and Non-overlapping Surveys

Correlation Analysis of Overlapping Surveys

Effect of Part of Week on Survey Response Ratios

Table 8.

Fig. 8.

Effect of Part of Day on Survey Response Ratios

Table 9.

Table 10.

Fig. 9.

Responsiveness Variation Across Sub-groups

Response Ratio Variation of an Individual

Fig. 10.

Fig. 11.

Response Ratio Variation of Entire Population

Table 11.

Distribution of Survey Response Delays

Fig. 12.

Fig. 13.

Fig. 14.

Effect of Active Period on Survey Response Delays

Fig. 15.

Fig. 16.

Table 12.

Effect of Part of Week on Survey Response Delays

Effect of Part of Day on Survey Response Delays

Table 13.

Effect of Academic Calendar on Survey Response Delays

Table 14.

Table 15.

Effect of Survey Length, Trigger Frequency, and Active Duration on CR

Fig. 17.

Subjects’ Nature/Pattern of Skipping Questions in a Survey

Fig. 18.

Fig. 19.

Effect of Part of Week on Survey Completion Ratios

Effect of Study Fatigue/Learning on Survey Completion Ratios

Effect of Academic Calendar on Survey Completion Ratios

Effect of Survey Length on Survey Completion Times

Fig. 20.

Fig. 21.

Table 16.

Effect of Part of Day on Survey Completion Times

Effect of Day of Week on Survey Completion Times

Table 17.

Effect of Academic Calendar on Survey Completion Times

Table 18.

Discussion

Limitations