An interactive fitness-for-use data completeness tool to assess activity tracker data

Sylvia Cho; Ipek Ensari; Noémie Elhadad; Chunhua Weng; Jennifer M Radin; Brinnae Bent; Pooja Desai; Karthik Natarajan

doi:10.1093/jamia/ocac166

. 2022 Sep 29;29(12):2032–2040. doi: 10.1093/jamia/ocac166

An interactive fitness-for-use data completeness tool to assess activity tracker data

Sylvia Cho ^1,^✉, Ipek Ensari ^2,³, Noémie Elhadad ^4,⁵, Chunhua Weng ^6,⁷, Jennifer M Radin ⁸, Brinnae Bent ⁹, Pooja Desai ¹⁰, Karthik Natarajan ^11,¹²

PMCID: PMC9667174 PMID: 36173371

Abstract

Objective

To design and evaluate an interactive data quality (DQ) characterization tool focused on fitness-for-use completeness measures to support researchers’ assessment of a dataset.

Materials and Methods

Design requirements were identified through a conceptual framework on DQ, literature review, and interviews. The prototype of the tool was developed based on the requirements gathered and was further refined by domain experts. The Fitness-for-Use Tool was evaluated through a within-subjects controlled experiment comparing it with a baseline tool that provides information on missing data based on intrinsic DQ measures. The tools were evaluated on task performance and perceived usability.

Results

The Fitness-for-Use Tool allows users to define data completeness by customizing the measures and its thresholds to fit their research task and provides a data summary based on the customized definition. Using the Fitness-for-Use Tool, study participants were able to accurately complete fitness-for-use assessment in less time than when using the Intrinsic DQ Tool. The study participants perceived that the Fitness-for-Use Tool was more useful in determining the fitness-for-use of a dataset than the Intrinsic DQ Tool.

Discussion

Incorporating fitness-for-use measures in a DQ characterization tool could provide data summary that meets researchers needs. The design features identified in this study has potential to be applied to other biomedical data types.

Conclusion

A tool that summarizes a dataset in terms of fitness-for-use dimensions and measures specific to a research question supports dataset assessment better than a tool that only presents information on intrinsic DQ measures.

Keywords: data quality, patient-generated health data, fitness trackers, user-centered design, usability testing

INTRODUCTION

The widespread use of personal wearable devices such as smartwatches or fitness trackers has enabled using real-world wearable device data collected by individuals for research studies.¹ Previous studies have demonstrated how routinely collected wearable device data can help answer many research questions about longitudinal physiological and behavioral changes in health.^2–5 For example, Quer et al² enrolled more than 30 000 volunteers and investigated whether sensor data and self-reported symptoms could improve the identification of Coronavirus Disease 2019 (COVID-19) participants rather than relying on self-reported symptoms alone. In addition, Radin et al³ obtained Fitbit data from over 47 000 volunteers and examined if objectively collected wearable device data such as resting heart rate and sleep data could be used to identify trends of influenza-like illness. These studies were able to use data from thousands of volunteers, which would have been costly and time-consuming in traditional prospective clinical studies to collect a comparable amount of participants. Thus, wearable device data have the potential to become one of the main data sources for generating real-world evidence. However, as with any datasets, there are data quality (DQ) challenges for wearable device data, such as incompleteness, incorrectness, and heterogeneity.⁶ Therefore, assessing DQ is an essential step to promote the reuse of wearable device data for research purposes, but this can be a time-consuming and complex task that burdens researchers.⁷^,⁸

One of the ways to support researchers in understanding the complexity and quality of data is to provide a tool that assists in DQ assessment. DQ characterization tools typically provide high-level insights on DQ by examining the structure and content of a dataset. In this case, summary statistics, such as count, sum, and mean or data formats, such as data types and length of values, could be examined. In addition, some data characterization tools explore data patterns on specific measures and relationship between variables to provide a deeper understanding about the dataset. For example, the Observational Health Data Sciences and Informatics (OHDSI) provides a data characterization tool called “Achilles,” which visualizes various data types, such as distribution of demographics, prevalence of condition, and data density over time.⁹ While currently existing tools are useful in understanding and assessing data, they focus on DQ measures that can be globally applied to any research task rather than those that are specific to a certain research question (fitness-for-use). The DQ research community has widely used the concept of fitness-for-use to define “quality,” which means that DQ is determined based on whether the dataset can fulfill the purpose of data use.¹⁰ Despite the task-dependent nature of DQ, few DQ characterization tools provide the flexibility to meet researchers’ needs on how to assess the quality of data. This limits the ability of researchers to determine whether the data would be useful for their research. For example, a dataset with 50% missingness for variable X may seem to have a lot of missing data, but when researchers look into the cohort that meets their inclusion criteria, the missingness might drop to 10% which could be an acceptable level for their research question. Therefore, a tool that characterizes data in terms of fitness-for-use measures that correspond to relevant DQ dimensions would be useful.

The objective of this study is to design and evaluate an interactive DQ characterization tool for real-world activity tracker data to support researchers in determining the fitness-for-use of a dataset for their intended research. The focus of this tool was on data completeness as it is a contextual DQ dimension that is dependent on research tasks and a dimension that is most frequently assessed by researchers.⁸^,¹¹ We hypothesize that the DQ characterization tool focused on fitness-for-use completeness measures of activity tracker data would better aid researchers determine whether a dataset has (1) sufficient amount of data (density completeness) and (2) all necessary variables needed (breadth completeness) for their intended research better than a general-purpose DQ tool.

MATERIALS AND METHODS

A prototype tool was developed following an iterative user-centered design approach in 3 phases: (1) identifying design requirements to inform the initial design, (2) designing and evaluating a prototype, and (3) evaluating the final design.¹²^,¹³

Phase 1: Initial design by identifying user informational needs

The design requirements of the tool were identified by informational needs gathering through 3 methods: (1) DQ framework, (2) literature review, and (3) semi-structured interview. First, a previously developed DQ framework for wearable device data was adopted in the design to refer to important DQ dimensions for wearable device data.¹⁴ In addition, a literature review was conducted to identify fitness-for-use data completeness measures used in research studies related to wearable devices. Lastly, semi-structured interviews were conducted to identify additional informational needs for data completeness assessment. The interview was conducted by a graduate student with training in qualitative research. The interview was conducted online for ∼30 min with researchers who have extensive experience with wearable device data. Questions were asked on the current practices of data completeness assessment and challenges of determining the fitness-for-use of a dataset. The interview guide can be found in the Supplementary Appendix.

Phase 2: Participatory design and tool development

Based on the design requirements identified from phase 1, a paper prototype of the tool was developed in Microsoft PowerPoint. Participants of the design session were eligible if they conducted research with fitness tracker data and had experience with the DQ assessment process. Various methods were used to find eligible participants, such as the study team’s professional network, social media (eg, LinkedIn), and academic communities (eg, Digital Medicine Society). All sessions were conducted online and participants provided feedback on the prototype verbally during the session. The participants were first presented with the paper prototype and were asked to provide feedback on its core functionalities, interface elements, structure, and layout. After the first session, an operational prototype was built in the format of an R Shiny web app based on a publicly available fitness tracker data.¹⁵ In addition to the core features and elements of the prototype, participants provided feedback on its style, color, and layout. Iterative design sessions were conducted until the tool design was finalized.

Phase 3: Usability evaluation on the final design

Study design

A controlled experimental evaluation was conducted with a within-subjects design to compare 2 tools: (1) a baseline “Intrinsic DQ Tool”—a DQ characterization tool which provides data completeness information on intrinsic DQ measures, and (2) a “Fitness-for-Use Tool”—a DQ characterization tool incorporating fitness-for-use data completeness measures in addition to the features in the baseline Intrinsic DQ Tool. The Fitness-for-Use Tool presented more information than the Intrinsic DQ Tool, and thus directly comparing the 2 tools was not feasible. In order to provide the same level of information through both tools, the study participants were given the option to analyze the raw data to carry out the tasks. The order of presenting the 2 tools was randomly assigned and counter-balanced. The tools were named as “Tool A” and “Tool B” in the evaluation sessions to avoid bias that could occur due to using the term “Fitness-for-Use Tool” and “Intrinsic DQ Tool.”

Baseline tool: an intrinsic DQ tool

A snapshot of the Intrinsic DQ Tool is presented below (Figure 1).

The Intrinsic DQ Tool was created to simulate a typical DQ characterization tool that presents descriptive statistics on measures that are independent of research tasks. The core feature was presenting the distribution of percent missingness (both NA and zero values) in the data. The tool also enabled filtering of the data based on demographics, clinical variables, and metadata related to data collection period and device types. The dataset embedded in the tool was an adapted version, by sampling, of the original dataset.¹⁵ This was to avoid bias that could occur if participants are provided the same dataset in the 2 comparing tools during the usability testing. The Intrinsic DQ Tool that was used in the evaluation study can be found at: https://sylviacho.shinyapps.io/Tool_B/.

Participant recruitment

Researchers interested in using fitness tracker data for their research were eligible regardless of their previous experience and domain knowledge in fitness tracker data analysis. This was to represent the actual target users of a tool which are both experts and non-experts of DQ assessment for fitness tracker data. Similar to Phase 2, participants were recruited through the professional network of the research team, public advertisement on relevant forums (eg, Quantified Self, Reddit/QuantifiedSelf), and online search for those with background in data science or fitness tracker data (eg, biostatistics, kinesiology). Participants were contacted through email and were compensated $50 for participation.

Study procedure

The procedure of the study is depicted in Figure 2.

Participants first watched a 4-min video that explained the study and procedures involved, followed by a 6-min tutorial video on the 2 tools being evaluated. The participants were then given time to interact with the tools and ask questions. The participants were provided the raw datasets embedded in the tools and were given time to explore them.

Before starting the evaluation, participants were asked a few questions about their familiarity with fitness tracker data, DQ assessment, R Shiny dashboards, and programming skills. The participants used the 2 tools to carry out assigned tasks. The participants had the option to analyze the dataset embedded in the tool if the information needed to carry out the tasks was not provided in the tool. The tasks consisted of questions assessing the fitness-for-use of the dataset in different research scenarios. There were 3 questions in the taskset, each corresponding to different data completeness dimensions. Task 1 (T1) and Task 2 (T2) were designed to evaluate the ability of the tools to assess “breadth completeness” and “density completeness (intrinsic to the data itself),” respectively. Task 3 (T3) was designed to assess the breadth and density completeness of data using fitness-for-use measures that are dependent on research tasks. T3 could be carried out using the Fitness-for-Use Tool, but required the Intrinsic DQ Tool users to analyze the raw data on their own. Therefore, T3 was the main task that distinguishes the ability of the 2 tools to support fitness-for-use assessment. The tasks used in the sessions and the rationale behind the tasks are presented in the Supplementary Appendix Table S1.

After completing the tasks, the participants were asked to respond to a survey on their experience with the tool. The survey consisted of a subset of items from the Health Information Technology Usability Evaluation Scale (Health-ITUES) and the entire System Usability Scale (SUS).¹⁶^,¹⁷ Health-ITUES is a survey that is intended to measure the perceived health IT usability and is unique in that it is customizable based on the system and task.¹⁶ In this study, only the “perceived usefulness” and “perceived ease of use” subscales were administered. In addition, the SUS, which is a 10-item scale commonly used to collect users’ subjective assessment on the usability of any technology, was administered.¹⁷ After completing 2 sets of tasks and surveys using the tools, a short interview was conducted to collect qualitative feedback on the tool. Participants were asked questions to probe (1) what they liked or did not like about the Fitness-for-Use Tool and (2) whether the Fitness-for-Use Tool supported DQ assessment. Additional interview questions can be found in the Supplementary Appendix.

Evaluation and usability data analysis

Both the Intrinsic DQ Tool and Fitness-for-Use Tool were evaluated on 3 criteria. First, we evaluated how well the tool performed in what it was designed to do (task performance), operationalized by task completion time, task completion rate, and accuracy of answers being compared between the tools. Second, users’ perception of the tool (subjective measures) were evaluated through the Health-ITUES and SUS.¹⁷ The surveys used in the study are presented in Supplementary Appendix Tables S2 and S3. Third, participants’ in-depth thoughts on the tool were captured through semi-structured interviews to collect information that could not be covered through quantitative measures. Task completion and accuracy were analyzed using the exact McNemar test.¹⁸ Task completion time and survey results were compared between the 2 tools using a paired t-test. In addition, an inductive thematic analysis was conducted on qualitative data.

RESULTS

Phase 1: Identifying user informational needs on DQ assessment

Conceptual framework on DQ dimensions

Breadth completeness and density completeness were identified as fitness-for-use dimensions for wearable device data and were selected as the 2 DQ dimensions that should be characterized in the tool. Details of the 2 dimensions are described in Cho et al.¹⁴

Literature review

Eight studies were included in the final categorization of fitness-for-use measures and these results were confirmed with 2 other published literature.¹⁹^,²⁰ The high-level categories of fitness-for-use measures for data completeness are presented in Figure 3. Further details are described in Supplementary Table S4.

Figure 3. — Fitness-for-use measures for data completeness.

Semi-structured interview

The interviews were conducted with 3 eligible researchers between July and September, 2020. The researchers had background in biostatistics, kinesiology and data science, and biomedical engineering, respectively. There were 2 main findings from the interviews. First, best practices and standards were considered important because fitness-for-use is not well defined for wearable device data and is highly dependent on use cases. As there is no systematic list of measures or rules to follow, it is difficult for researchers unfamiliar with wearable device data to determine the fitness-for-use of a dataset. A DQ characterization tool should be able to support researchers by providing a structure that researchers can follow to determine whether a dataset is complete or not. Second, there are particular data and metadata that were considered important to understand the data and its quality. For example, information such as type of devices (eg, brand, model, and version) that were used to collect data, data collection period, and demographics of the subjects in the data, such as age, were considered important.

Design requirements identified

In summary, the DQ characterization tool should be able to support determining fitness-for-use based on best practices. The DQ framework and results from the literature review on data completeness were applied to the tool design. Furthermore, the tool should provide basic characteristics of the data in terms of demographics, devices used, and data collection period. Design requirement details are in the Supplementary Appendix Table S5.

Phase 2: Participatory design and tool development

Five domain experts were recruited. In the first design session, 5 domain experts reviewed the paper prototype. The participants agreed with the overall structure of the tool. They also provided feedback on features related to determining fitness-for-use in terms of data completeness and types of information that should be presented in the tool. After the first session, 2 domain experts withdrew from the study. Two more sessions were conducted with the 3 remaining experts, who provided feedback on the R Shiny version of the prototype. Further information is in the Supplementary Appendix.

Final design of fitness-for-use DQ tool

The final design of the prototype includes a sidebar and a main panel. The sidebar allows users to subset the data to define the cohort of interest based on demographics, a few clinical variables, and metadata such as device types and date. The sidebar could be customized based on the dataset used in future versions.

The main panel consists of 5 tabs: (1) Overview, (2) Explore individual data, (3) Missing data analysis, (4) Define data completeness, and (5) Summary of cohort with complete data. Here, we describe the main tabs of the fitness-for-use DQ tool, which are “Define data completeness” and “Summary of cohort with complete data.” The remaining tabs, which are also present the intrinsic DQ tool, are described in the Supplementary Appendix Figures S1–S3.

Define “data completeness”

This tab allows users to define what data completeness means in their research. DQ dimensions and corresponding fitness-for-use measures were investigated and incorporated in the tool. The tool allows users to first define what a valid day is—a day with sufficient data that can be included in the analysis. The users can then select how many valid days are needed for their research and whether valid days need to be consecutive. There are also advanced features for research comparing weekday versus weekend or analyzing trends over months. Additional features can be added in future version of this tool. The snapshot of the “Define Data Completeness” tab is presented in Figure 4.

Figure 4. — “Define data completeness” tab.

Summary of cohort with complete data

Once the definition of completeness is determined by users, the tool calculates the number of participants that meets the definition in the defined cohort. This number demonstrates the density completeness of the dataset using fitness-for-use measures. It also shows the number of participants with complete data for each variable such as step count and heart rate data, which indicate the breadth and density completeness of data. Lastly, the tool presents the data summary on those with complete data among the cohort of interest. The snapshot of the tab is presented in Figure 5.

Figure 5. — “Summary of cohort with complete data” tab.

The final operational prototype of the tool can be found at: https://sylviacho.shinyapps.io/Tool_A/.

Phase 3: Usability evaluation on the final design

Background of study participants

Ten researchers were recruited for the evaluation study. The majority of participants were at least moderately familiar with DQ assessment (N = 8) and wearable device/fitness tracker data (N = 8). Most of the participants felt confident about statistical programming (N = 9) and more than half were at least moderately familiar with navigating R Shiny Dashboards (N = 7). The participants characteristics are presented in Table 1.

Table 1.

Characteristics and background of study participants

Characteristics and background	N	Percentage
Current job title
Doctoral Student (Biomedical Informatics, Statistics)	4	40
Data Scientist/Statistician	4	40
Research Assistant/Scientist (Epidemiology, Medical Science)	2	20
Prior familiarity with assessing DQ
Extremely familiar	0	0
Very familiar	5	50
Moderately familiar	3	30
Slightly familiar	2	20
Not familiar at all	0	0
Familiarity with wearable device data/fitness tracker data
Extremely familiar	2	20
Very familiar	4	40
Moderately familiar	2	20
Slightly familiar	1	10
Not familiar at all	1	10
Comfort level with statistical programming
Extremely comfortable	6	60
Somewhat comfortable	3	30
Neither comfortable nor uncomfortable	1	10
Somewhat uncomfortable	0	0
Extremely uncomfortable	0	0
Prior familiarity with R Shiny Dashboards
Extremely familiar	3	30
Very familiar	0	0
Moderately familiar	4	40
Slightly familiar	2	20
Not familiar at all	1	10

Open in a new tab

Task performance results

Out of the 10 participants, one participant was excluded in the results analysis because the participant misunderstood the study tasks and exclusively used the raw dataset to answer all task questions rather than using the tools. Task 1 (T1) and 2 (T2) were control questions which could be answered using the sidebar, Data Summary, and Missing Data Analysis tabs that are in both the Fitness-for-Use Tool and Intrinsic DQ Tool, and this was reflected in the results of our study. All of the 9 participants were able to accurately complete T1 and T2 using both the Fitness-for-Use Tool and Intrinsic DQ Tool. In addition, participants completed T1 and T2 in 4.3 min on average using the Intrinsic DQ Tool versus 5.18 min on average using the Fitness-for-Use Tool, for which the difference was not statistically significantly different (P = .78).

Task completion rate for T3 was significantly higher when using the Fitness-for-Use Tool (100%) compared to the Intrinsic DQ Tool (33%) (P = .016). Accuracy of completing T3 was significantly higher when using the Fitness-for-Use Tool (78%) compared to the Intrinsic DQ Tool (22%) (P = .031). Using the Intrinsic DQ Tool, 3 participants were able to complete T3, and among them 2 participants submitted accurate answers. In the analysis, we calculated the accuracy to be 2 out of 9 participants being able to accurately complete the task using the Intrinsic DQ Tool. The results are presented in Table 2.

Table 2.

Task completeness and accuracy on Task 3 using Fitness-for-Use versus Intrinsic DQ Tool (N = 9 participants)

Task completeness				Task accuracy
		Fitness-for-Use Tool				Fitness-for-Use Tool
		Task completed	Task incomplete			Task completed accurately	Task completed inaccurately
Intrinsic DQ Tool	Task completed	3	0	Intrinsic DQ Tool	Task completed accurately	2	0
Intrinsic DQ Tool	Task incomplete	6	0	Intrinsic DQ Tool	Task completed inaccurately	5	2

Open in a new tab

Time on task for T3 was analyzed by giving a 15-min penalty to those who quit the task. Using the Fitness-for-Use Tool took significantly less time to complete the task than when using the Intrinsic DQ Tool. The summary of time on task is presented in Table 3.

Table 3.

Average time on task (minutes) using Fitness-for-Use versus Intrinsic DQ Tool

Tasks	Fitness-for-Use Tool	Intrinsic DQ Tool	P-value
T1 and T2	5.18	4.3	.59
T3	1.85	14.47	<.005

Open in a new tab

Survey results—perceived usability and usefulness

Participants perceived the Fitness-for-Use Tool to be more usable than the Intrinsic DQ Tool for fitness-for-use assessment. The Fitness-for-Use Tool had a higher overall Health-ITUES score than the Intrinsic DQ Tool, and the participants perceived the Fitness-for-Use Tool to be more useful in determining the fitness-for-use of a dataset than the Intrinsic DQ Tool. In addition, the Fitness-for-Use Tool was perceived to be easy to use as much as the Intrinsic DQ Tool, despite the addition of a more complicated feature where users have to define their own completeness. This result was confirmed through the SUS through which we found that adding complex fitness-for-use features did not reduce the usability of the tool (84.2 vs 81.2, P = .15). Both the Intrinsic DQ Tool and Fitness-for-Use Tool had a score >80, which means that both are an acceptable and good tool according to an empirical evaluation of the SUS.²¹ The result of survey responses is reported in Table 4.

Table 4.

Health-ITUES and System Usability Scale (SUS) scores for Fitness-for-Use Tool and Intrinsic DQ Tool

Scale	Fitness-for-Use Tool	Intrinsic DQ Tool	P-value
Health-ITUES overall	4.35	3.51	.002
Perceived usefulness	4.31	3.06	.001
Perceived ease of use	4.42	4.32	.32
System Usability Scale	84.25	81.25	.15

Open in a new tab

Interview results—confirming quantitative measures through qualitative data

Participants mentioned that the Fitness-for-Use Tool enabled fitness-for-use assessment even for those without the domain knowledge or technical background on DQ assessment for fitness tracker data. This allows users to conduct DQ assessment in shorter time despite being a non-expert.

Fitness-for-Use Tool has more information, but it’s not overwhelming and you can conduct data quality assessment in less time… It suggests what measures are currently used in the field, so even if you’re not an expert, you can do the data quality assessment in less time.

Also, the fact that researchers without advanced programming skills can assess fitness-for-use of a dataset was mentioned as a positive aspect of the tool.

I think it would be useful… I don’t have to write my own code. We actually have an R package that we can run to do similar things, but your tool is more interactive for people. It’s easier for people who are not familiar with R. Instead of looking at a bunch of R codes, you can change the parameters using drop down and slide bar.

Most participants said that the tool was easy to use for assessing fitness-for-use through a few simple clicks rather than having to analyze the data from scratch. However, one participant mentioned that while it is easy to use, it may require an initial learning period.

DISCUSSION

The goal of this study was to design and evaluate a DQ characterization tool that incorporates fitness-for-use dimensions and measures specifically for fitness tracker data to support the DQ assessment process for researchers. Breadth and density completeness which are fitness-for-use dimensions were incorporated as the main aspects of DQ to assess, and corresponding measures such as number of valid days and duration of data collection were included in the tool. The authors found that a tool providing data summary on fitness-for-use measures with customizable thresholds could support completing the DQ assessment tasks for certain research question in less time with higher accuracy than a tool providing general information on intrinsic DQ measures only. Although the Fitness-for-Use Tool incorporates more complex features requiring critical thinking, participants perceived the ease of use of the Fitness-for-Use Tool to be similar to the typical DQ tool and perceived that the Fitness-for-Use Tool was more useful in determining fitness-for-use of a dataset.

To our best knowledge, this is the first study that attempted to design a tool that incorporates fitness-for-use measures specifically for fitness tracker data. Currently, the Fitness-for-Use Tool incorporates only the basic fitness-for-use measures, but it can be expanded to more complicated measures or other DQ dimensions. In addition, although the tool was designed and evaluated for fitness tracker data, the concept of this tool could be applied to other biomedical data types. Through this study, we identified the design features that could be incorporated in a fitness-for-use data completeness tool and found that features such as (1) summarizing data characteristics, (2) allowing users to subset data and define their cohort of interest, (3) providing features to customize fitness-for-use completeness measures, and (4) analyzing breadth and density completeness of the dataset based on measures customized by users would support data completeness assessment. This result aligns with the opinion of real-world data experts that determining the fitness-for-use of real-world data depends on factors such as the (1) “representativeness of population,” which is related to the Fitness-for-Use tool’s feature for cohort definition, (2) “availability of complete exposure window,” which is related to the feature that allows subsetting the data based on timestamps, (3) “availability of key data elements,” which is related to breadth completeness, and (4) “sufficient number of subjects,” which is related to “density completeness.”’²²

Our study also has the potential to contribute to the secondary use of data which is a key component to real-world evidence.²³ In order to promote the reuse of data, which is the ultimate goal of the FAIR (Findable, Accessible, Interoperable, Reusable) data principles, it is important that researchers are aware of the quality of data for which DQ characterization tools could alleviate the burden.²⁴^,²⁵ For example, the All of Us research program is an initiative by the National Institutes of Health to collect health data from more than 1 million participants and make it publicly available to researchers to generate real-world evidence.²⁶ The All of Us data are open to traditional and non-traditional researchers who may not have previous experience in healthcare research which means that those without the domain knowledge are also the target users of the All of Us data.²⁷ One of the challenges to these researchers would be in understanding the data, assessing DQ, and determining whether the data is fit for their research question. Currently, the All of Us Data Browser provides data summary for each data type such as conditions and drug exposures from the electronic health record or heart rate and activity data from Fitbit data, but there is limited information and features to support fitness-for-use assessment.²⁸ This applies to other data sources as well among which only few of them provide data visualization tools, or visualizations provided are static and not versatile enough to meet researcher needs.²⁹ Therefore, researchers have to manually generate custom visualization on the data summary which could be timely and costly.²⁹ The design features we identified for the Fitness-for-Use tool could be applied to tools for other real-world data by incorporating completeness measures that are commonly used in research studies, and provide researchers the opportunity to interact with the data based on their needs.

There are a few limitations in this study. First, the small number of target end users that participated in the tool design may not be representative of the target end users in the domain of wearable device data and DQ. Therefore, there might be additional features required by other potential users or domain experts in future versions of the tool. Despite this limitation, the tool was designed by triangulating information from the conceptual framework, literature review, and semi-structured interview, followed by iterative design sessions by the target end users. Second, the fitness-for-use measures incorporated into the tool are not applicable to all potential use cases of fitness tracker data. For example, in the current version, only the number of valid days within a certain time frame were considered, but if the data values are spaced unevenly and concentrated on only one side of the data collection period, it could be invalid for certain use cases.³⁰ However, the current version of the tool incorporates the most commonly used and generalizable fitness-for-use measures that were identified by the authors. Despite the limitations mentioned above, our study sets the ground work for researchers who aim to expand the current tool.

CONCLUSION

The widespread use of consumer wearable devices has gained interest in using wearable device data for research purposes. However, fitness-for-use assessment could be unfamiliar to researchers within or outside the wearable device domain. In order to address this challenge, an interactive DQ characterization tool that incorporates fitness-for-use measures, allowing users to customize the fitness-for-use measures and its thresholds, was developed. In the evaluation study, the tool enabled researchers to accurately complete fitness-for-use assessment of a dataset in less time than a typical DQ tool that only presents basic descriptive statistics. These findings demonstrate that DQ characterization tools focusing on fitness-for-use measures and providing customized data summary specific to research tasks can be useful.

FUNDING

This work was supported by the National Center for Advancing Translational Sciences (1U01TR002062-01) and the National Institute of Health’s All of Us Research Program (1U2COD023196-01).

AUTHOR CONTRIBUTIONS

The phase 1 and phase 2 of the study was designed by SC and KN. Phase 3 study was designed by SC, KN, NE, CW, and PD. Major contributions to the tool design were made by IE, JMR, and BB. Studies were conducted by SC and the initial manuscript was drafted by SC. All authors contributed to editing and providing feedback to the development of the manuscript. All authors gave approval for submission.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

Supplementary Material

ocac166_Supplementary_Data

Click here for additional data file.^{(440.8KB, docx)}

ACKNOWLEDGMENTS

The authors would like to thank all study participants for providing their knowledge and insight.

CONFLICT OF INTEREST STATEMENT

None declared.

Contributor Information

Sylvia Cho, Department of Biomedical Informatics, Columbia University, New York, New York, USA.

Ipek Ensari, Department of Artificial Intelligence and Human Health, Icahn School of Medicine, New York, New York, USA; Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

Noémie Elhadad, Department of Biomedical Informatics, Columbia University, New York, New York, USA; Data Science Institute, Columbia University, New York, New York, USA.

Chunhua Weng, Department of Biomedical Informatics, Columbia University, New York, New York, USA; Data Science Institute, Columbia University, New York, New York, USA.

Jennifer M Radin, Scripps Research Translational Institute, La Jolla, California, USA.

Brinnae Bent, Department of Biomedical Engineering, Duke University, Durham, North Carolina, USA.

Pooja Desai, Department of Biomedical Informatics, Columbia University, New York, New York, USA.

Karthik Natarajan, Department of Biomedical Informatics, Columbia University, New York, New York, USA; Data Science Institute, Columbia University, New York, New York, USA.

Data Availability

The data underlying this article will be shared on reasonable request to the corresponding author.

REFERENCES

1. Vogels EA. About one-in-five Americans use a smart watch or fitness tracker. Pew Research Center. 2020. https://www.pewresearch.org/fact-tank/2020/01/09/about-one-in-five-americans-use-a-smart-watch-or-fitness-tracker/. Accessed September 15, 2022.
2. Quer G, Radin JM, Gadaleta M, et al. Wearable sensor data and self-reported symptoms for COVID-19 detection. Nat Med 2021; 27 (1): 73–7. [DOI] [PubMed] [Google Scholar]
3. Radin JM, Wineinger NE, Topol EJ, et al. Harnessing wearable device data to improve state-level real-time surveillance of influenza-like illness in the USA: a population-based study. Lancet Digit Health 2020; 2 (2): e85–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Menai M, Brouard B, Vegreville M, et al. Cross-sectional and longitudinal associations of objectively-measured physical activity on blood pressure: evaluation in 37 countries. Health Promot Perspect 2017; 7 (4): 190–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Kim K-I, Nikzad N, Quer G, et al. Real world home blood pressure variability in over 56,000 individuals with nearly 17 million measurements. Am J Hypertens 2018; 31 (5): 566–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Cho S, Ensari I, Weng C, et al. Factors affecting the quality of person-generated wearable device data and associated challenges: rapid systematic review. JMIR Mhealth Uhealth 2021; 9 (3): e20738. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Lee K, Weiskopf N, Pathak J.. A framework for data quality assessment in clinical research datasets. AMIA Annu Symp Proc 2017; 2017: 1080–9. [PMC free article] [PubMed] [Google Scholar]
8. Callahan T, Barnard J, Helmkamp L, et al. Reporting data quality assessment results: identifying individual and organizational barriers and solutions. EGEMS (Washington, DC) 2017; 5 (1): 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Hripcsak G, Duke JD, Shah NH, et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015; 216: 574–8. [PMC free article] [PubMed] [Google Scholar]
10. Weiskopf NG, Weng C.. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013; 20 (1): 144–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Strong DM, Lee YW, Wang RY.. Data quality in context. Commun ACM 1997; 40 (5): 103–10. [Google Scholar]
12. Kies JK, Williges RC, Rosson MB.. Coordinating computer‐supported cooperative work: a review of research issues and strategies. J Am Soc Inf Sci 1998; 49: 776–91. [Google Scholar]
13. Hartson HR, Andre TS, Williges RC.. Criteria for evaluating usability evaluation methods. Int J Hum Comput Interact 2001; 13 (4): 373–410. [Google Scholar]
14. Cho S, Weng C, Kahn MG, et al. Identifying data quality dimensions for person-generated wearable device data: a multi-method study. JMIR Mhealth Uhealth 2021; 9: e31618. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Lim WK, Davila S, Teo JX, et al. Beyond fitness tracking: the use of consumer-grade wearable data from normal volunteers in cardiovascular and lipidomics research. PLoS Biol 2018; 16 (2): e2004285. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Yen P-Y, Wantland D, Bakken S.. Development of a customizable health IT usability evaluation scale. AMIA Annu Symp Proc 2010; 2010: 917–21. https://pubmed.ncbi.nlm.nih.gov/21347112. [PMC free article] [PubMed] [Google Scholar]
17. Brooke J. SUS: A quick and dirty usability scale. In: Jordan PW, Thomas B, McClelland IL, Weerdmeester B, eds. Usability Evaluation in Industry. London: CRC Press; 1996. [Google Scholar]
18. Fay MP, Lumbard K. Confidence intervals for difference in proportions for matched pairs compatible with exact McNemar’s or sign tests. Stat Med 2021; 40 (5): 1147–59. [DOI] [PMC free article] [PubMed]
19. Tang LM, Meyer J, Epstein DA, et al. Defining adherence: making sense of physical activity tracker data. Proc ACM Interact Mob Wearable Ubiquitous Technol 2018; 2 (1): 1–22. [Google Scholar]
20. Toftager M, Kristensen PL, Oliver M, et al. Accelerometer data reduction in adolescents: effects on sample retention and bias. Int J Behav Nutr Phys Act 2013; 10: 140. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Bangor A, Kortum PT, Miller JT.. An empirical evaluation of the system usability scale. Int J Hum–Comput Interact 2008; 24 (6): 574–94. [Google Scholar]
22. Shore C, Gee AW, Kahn B, et al. When is a real-world data element fit for assessment of eligibility, treatment exposure, or outcomes? In: Examining the Impact of Real-World Evidence on Medical Product Development: Proceedings of a Workshop Series. Washington (DC: ): National Academies Press (US; ); 2019. https://www.ncbi.nlm.nih.gov/books/NBK540105/. Accessed July 29, 2022. [PubMed] [Google Scholar]
23. Reich C, Arnoe M. Secondary First: A Better Approach to RWE. 2020. https://www.iqvia.com/locations/united-states/blogs/2020/09/secondary-first-a-better-approach-to-rwe. Accessed July 26, 2022.
24. Wilkinson MD, Dumontier M, Aalbersberg I, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016; 3: 160018. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Stephens KA, Lee ES, Estiri H, et al. Examining researcher needs and barriers for using electronic health data for translational research. AMIA Jt Summits Transl Sci Proc 2015; 2015: 168–72. [PMC free article] [PubMed] [Google Scholar]
26. All of Us Research Program Investigators. The ‘All of Us’ research program. N Engl J Med 2019; 381: 668–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. National Insitutes of Health. All of Us Research Program. 2021. https://allofus.nih.gov/sites/default/files/AOU_Core_Protocol_Redacted_Dec_2021.pdf. Accessed July 29, 2022.
28. National Insitutes of Health. All of Us Data Browser. 2022. https://databrowser.researchallofus.org/. Accessed July 29, 2022.
29. Dixit R, Rogith D, Narayana V, et al. User needs analysis and usability assessment of DataMed—a biomedical data discovery index. J Am Med Inform Assoc 2018; 25 (3): 337–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Sperrin M, Thew S, Weatherall J, et al. Quantifying the longitudinal value of healthcare record collections for pharmacoepidemiology. AMIA Annu Symp Proc 2011; 2011: 1318–25. [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocac166_Supplementary_Data

Click here for additional data file.^{(440.8KB, docx)}

Data Availability Statement

The data underlying this article will be shared on reasonable request to the corresponding author.

[ocac166-B1] 1. Vogels EA. About one-in-five Americans use a smart watch or fitness tracker. Pew Research Center. 2020. https://www.pewresearch.org/fact-tank/2020/01/09/about-one-in-five-americans-use-a-smart-watch-or-fitness-tracker/. Accessed September 15, 2022.

[ocac166-B2] 2. Quer G, Radin JM, Gadaleta M, et al. Wearable sensor data and self-reported symptoms for COVID-19 detection. Nat Med 2021; 27 (1): 73–7. [DOI] [PubMed] [Google Scholar]

[ocac166-B3] 3. Radin JM, Wineinger NE, Topol EJ, et al. Harnessing wearable device data to improve state-level real-time surveillance of influenza-like illness in the USA: a population-based study. Lancet Digit Health 2020; 2 (2): e85–93. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B4] 4. Menai M, Brouard B, Vegreville M, et al. Cross-sectional and longitudinal associations of objectively-measured physical activity on blood pressure: evaluation in 37 countries. Health Promot Perspect 2017; 7 (4): 190–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B5] 5. Kim K-I, Nikzad N, Quer G, et al. Real world home blood pressure variability in over 56,000 individuals with nearly 17 million measurements. Am J Hypertens 2018; 31 (5): 566–73. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B6] 6. Cho S, Ensari I, Weng C, et al. Factors affecting the quality of person-generated wearable device data and associated challenges: rapid systematic review. JMIR Mhealth Uhealth 2021; 9 (3): e20738. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B7] 7. Lee K, Weiskopf N, Pathak J.. A framework for data quality assessment in clinical research datasets. AMIA Annu Symp Proc 2017; 2017: 1080–9. [PMC free article] [PubMed] [Google Scholar]

[ocac166-B8] 8. Callahan T, Barnard J, Helmkamp L, et al. Reporting data quality assessment results: identifying individual and organizational barriers and solutions. EGEMS (Washington, DC) 2017; 5 (1): 16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B9] 9. Hripcsak G, Duke JD, Shah NH, et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015; 216: 574–8. [PMC free article] [PubMed] [Google Scholar]

[ocac166-B10] 10. Weiskopf NG, Weng C.. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013; 20 (1): 144–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B11] 11. Strong DM, Lee YW, Wang RY.. Data quality in context. Commun ACM 1997; 40 (5): 103–10. [Google Scholar]

[ocac166-B12] 12. Kies JK, Williges RC, Rosson MB.. Coordinating computer‐supported cooperative work: a review of research issues and strategies. J Am Soc Inf Sci 1998; 49: 776–91. [Google Scholar]

[ocac166-B13] 13. Hartson HR, Andre TS, Williges RC.. Criteria for evaluating usability evaluation methods. Int J Hum Comput Interact 2001; 13 (4): 373–410. [Google Scholar]

[ocac166-B14] 14. Cho S, Weng C, Kahn MG, et al. Identifying data quality dimensions for person-generated wearable device data: a multi-method study. JMIR Mhealth Uhealth 2021; 9: e31618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B15] 15. Lim WK, Davila S, Teo JX, et al. Beyond fitness tracking: the use of consumer-grade wearable data from normal volunteers in cardiovascular and lipidomics research. PLoS Biol 2018; 16 (2): e2004285. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B16] 16. Yen P-Y, Wantland D, Bakken S.. Development of a customizable health IT usability evaluation scale. AMIA Annu Symp Proc 2010; 2010: 917–21. https://pubmed.ncbi.nlm.nih.gov/21347112. [PMC free article] [PubMed] [Google Scholar]

[ocac166-B17] 17. Brooke J. SUS: A quick and dirty usability scale. In: Jordan PW, Thomas B, McClelland IL, Weerdmeester B, eds. Usability Evaluation in Industry. London: CRC Press; 1996. [Google Scholar]

[ocac166-B18] 18. Fay MP, Lumbard K. Confidence intervals for difference in proportions for matched pairs compatible with exact McNemar’s or sign tests. Stat Med 2021; 40 (5): 1147–59. [DOI] [PMC free article] [PubMed]

[ocac166-B19] 19. Tang LM, Meyer J, Epstein DA, et al. Defining adherence: making sense of physical activity tracker data. Proc ACM Interact Mob Wearable Ubiquitous Technol 2018; 2 (1): 1–22. [Google Scholar]

[ocac166-B20] 20. Toftager M, Kristensen PL, Oliver M, et al. Accelerometer data reduction in adolescents: effects on sample retention and bias. Int J Behav Nutr Phys Act 2013; 10: 140. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B21] 21. Bangor A, Kortum PT, Miller JT.. An empirical evaluation of the system usability scale. Int J Hum–Comput Interact 2008; 24 (6): 574–94. [Google Scholar]

[ocac166-B22] 22. Shore C, Gee AW, Kahn B, et al. When is a real-world data element fit for assessment of eligibility, treatment exposure, or outcomes? In: Examining the Impact of Real-World Evidence on Medical Product Development: Proceedings of a Workshop Series. Washington (DC: ): National Academies Press (US; ); 2019. https://www.ncbi.nlm.nih.gov/books/NBK540105/. Accessed July 29, 2022. [PubMed] [Google Scholar]

[ocac166-B23] 23. Reich C, Arnoe M. Secondary First: A Better Approach to RWE. 2020. https://www.iqvia.com/locations/united-states/blogs/2020/09/secondary-first-a-better-approach-to-rwe. Accessed July 26, 2022.

[ocac166-B24] 24. Wilkinson MD, Dumontier M, Aalbersberg I, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016; 3: 160018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B25] 25. Stephens KA, Lee ES, Estiri H, et al. Examining researcher needs and barriers for using electronic health data for translational research. AMIA Jt Summits Transl Sci Proc 2015; 2015: 168–72. [PMC free article] [PubMed] [Google Scholar]

[ocac166-B26] 26. All of Us Research Program Investigators. The ‘All of Us’ research program. N Engl J Med 2019; 381: 668–76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B27] 27. National Insitutes of Health. All of Us Research Program. 2021. https://allofus.nih.gov/sites/default/files/AOU_Core_Protocol_Redacted_Dec_2021.pdf. Accessed July 29, 2022.

[ocac166-B28] 28. National Insitutes of Health. All of Us Data Browser. 2022. https://databrowser.researchallofus.org/. Accessed July 29, 2022.

[ocac166-B29] 29. Dixit R, Rogith D, Narayana V, et al. User needs analysis and usability assessment of DataMed—a biomedical data discovery index. J Am Med Inform Assoc 2018; 25 (3): 337–44. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac166-B30] 30. Sperrin M, Thew S, Weatherall J, et al. Quantifying the longitudinal value of healthcare record collections for pharmacoepidemiology. AMIA Annu Symp Proc 2011; 2011: 1318–25. [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An interactive fitness-for-use data completeness tool to assess activity tracker data

Sylvia Cho

Ipek Ensari

Noémie Elhadad

Chunhua Weng

Jennifer M Radin

Brinnae Bent

Pooja Desai

Karthik Natarajan

Abstract

Objective

Materials and Methods

Results

Discussion

Conclusion

INTRODUCTION

MATERIALS AND METHODS

Phase 1: Initial design by identifying user informational needs

Phase 2: Participatory design and tool development

Phase 3: Usability evaluation on the final design

Study design

Baseline tool: an intrinsic DQ tool

Figure 1.

Participant recruitment

Study procedure

Figure 2.

Evaluation and usability data analysis

RESULTS

Phase 1: Identifying user informational needs on DQ assessment

Conceptual framework on DQ dimensions

Literature review

Figure 3.

Semi-structured interview

Design requirements identified

Phase 2: Participatory design and tool development

Final design of fitness-for-use DQ tool

Define “data completeness”

Figure 4.

Summary of cohort with complete data

Figure 5.

Phase 3: Usability evaluation on the final design

Background of study participants

Table 1.

Task performance results

Table 2.

Table 3.

Survey results—perceived usability and usefulness

Table 4.

Interview results—confirming quantitative measures through qualitative data

DISCUSSION

CONCLUSION

FUNDING

AUTHOR CONTRIBUTIONS

SUPPLEMENTARY MATERIAL

Supplementary Material

ACKNOWLEDGMENTS

CONFLICT OF INTEREST STATEMENT

Contributor Information

Data Availability

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases