Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jan 1.
Published in final edited form as: Comput Inform Nurs. 2023 Jan 1;41(1):1–5. doi: 10.1097/CIN.0000000000000970

Using Novel Data Visualization Methods to Understand Mobile Health Usability: Exemplar from a Technology-Enabled Sleep Self-Monitoring Intervention

Jenna L Marquard 1, Jordan Howard 2, Raeann LeBlanc 3
PMCID: PMC9851666  NIHMSID: NIHMS1827329  PMID: 36634231

BACKGROUND AND SIGNIFICANCE

Mobile health (mHealth) technologies are increasingly seen as a key element in monitoring and improving aspects of an individual’s health. The mHealth technology market as a whole is large ($8.1 billion in 2018) and increasing, and regulations are changing so that clinicians can start to bill for their time spent reviewing these data.1

Several epidemiological studies have shown that sleep disturbance is common, with approximately 30% of the adult population having insomnia symptoms, and approximately 5–10% diagnosed with insomnia syndrome.24 Multiple studies have shown that the prevalence of sleep disturbances increases with age and with co-occurrence of other medical conditions.2,58 Not surprisingly, personal sleep monitoring devices, aimed at improving individuals’ sleep by providing actionable information via tracking and pattern recognition or suggesting interventions to improve sleep, are also becoming more prevalent.9,10 An analysis of patent applications between 2010–2018 for wearable medical devices showed that the number of patents for sleep monitoring devices was third highest, only behind heart rate and pulse measurement devices.9

While mHealth technologies, including those aimed at measuring sleep, are continuing to expand, there are many barriers to these technologies becoming widely adopted. Much of the research focused on improving mHealth technology design, for example, focuses on improving the validity of the device measurements.1112 However, if mHealth technologies are to improve health, having valid data is not sufficient; users must find the devices to be useful and usable. Else, individuals will likely either not adopt them or will discontinue use of them.

There are numerous validated questionnaires that can be used to measure mHealth technology usability. Among the most common usability questionnaires are the System Usability Scale (SUS), Health-ITUES, Technology Acceptance Model (TAM), and the Unified Theory on Acceptance and Use of Technology (UTAUT).1317 Each questionnaire measures between one to four constructs as predictors of system use. For example, SUS measures one construct, perceived usability, via ten questions.13 Health-ITUES assesses perceived usefulness, perceived ease-of-use, and user control.14,17 TAM measures perceived usefulness and perceived ease-of-use.15 UTAUT consists of four main theoretical constructs: performance expectancy, effort expectancy, social influence, and facilitating conditions.16 These questionnaires, particularly SUS, are frequently used to assess individual mHealth technologies or to compare two or more mHealth technologies.1822

However, mHealth usability studies typically collect data at one time point (often during the initial design process or during training) and often present aggregate data for all participants.18,20,22 The objective of this research is to determine, in the context of an mHealth sleep self-management intervention, the effectiveness of measuring and visualizing usability at multiple time points (as opposed to once), and analyzing and visualizing usability data at the participant and question level (as opposed to showing summary numerical information in tables).

MATERIALS AND METHODS

Study Participants and Data

The data presented in this manuscript came from a study exploring the use of a mHealth sleep monitoring device to self-manage sleep in older individuals. The study was approved by the University’s Institutional Review Board. Individuals 65 and older (n=20) with self-reported sleep disturbances were recruited from a semi-rural community setting. The research team sought to evaluate whether the mHealth intervention could be useful to and usable by this population. Participants used the device for four weeks. The research team captured numerical usability data via the validated system usability scale (SUS) after participants had used the system for one week, and then again after four weeks. Of the twenty participants, fourteen had usability data for both weeks one and four.

The SUS usability scale includes 10 questions, each measured on a Likert scale of 1–5 from strongly disagree to strongly agree.14 The SUS instrument also provides a mechanism for aggregating the 10 question-level scores into a single SUS score on a scale of 0–100, 100 being the best possible score.14 Half of the SUS questions are negatively worded (e.g., “I found the system unnecessarily complex”) and this negative wording is accounted for in the aggregate score.14 Because the aggregate SUS scale scores of 0–100 can potentially be misinterpreted (e.g., 65 might seem OK to one person and not another), the 0–100 SUS scores are often bucketed into more understandable categories as follows: Excellent (80.3–100), Good (68–80.3), Poor (51–68), and Awful (0–51).14

Data Analysis and Visualization Approaches

Our data analysis approach involved three broad methods to analyze the SUS scores across the two time points: 1) descriptive statistics of SUS scores for participants as a whole, to show how SUS data are typically analyzed and conveyed, 2) question-level analysis of SUS scores (on a scale of 1–5) to help us to better understand whether some aspects of the system were perceived differently than others (e.g., complexity versus learnability), and 3) slope chart visualizations depicting each individual’s overall and question-level SUS scores. Individual-level data shown in the slope charts were intended to help understand the composition of the numerical data in approaches 1 and 2. These slope charts allow for understanding changes between time points, but also the baseline from which that change happened. For example, question-level SUS score changes from 1 to 3 would look the same as 3 to 5 in a purely numerical change analysis, but have very different meanings.

RESULTS

Table 1 is a typical representation of SUS scores. Table 1 shows participants’ overall and question-level SUS scores, with the possible overall score range being 0–100 and the question-level range being 1–5. The overall SUS score changed from 76.8 to 74.1, with both weeks falling in the “Good” range of 68–80.3. The overall score was fairly stable between weeks one and four, with the score for week four falling within one standard deviation of the score for week one.

Table 1:

Aggregate overall and question-level SUS scores

Week 1 Avg (SD) Week 4 Avg (SD)
Overall: 0–100 scale, mapped to categories:
Excellent (80.3–100), Good (68–80.3), Poor (51–68), and Awful (0–51)
76.8 (10.2) 74.1 (10.3)
Individual Questions: 1–5 scale, from strongly disagree (1) to strongly agree (5)
1. I think that I would like to use this system frequently. 3.6 (1.4) 3.3 (1.9)
2. I found the system unnecessarily complex. 1.5 (0.7) 1.6 (1.0)
3. I thought the system was easy to use. 4.4 (0.7) 4.4 (1.2)
4. I think that I would need the support of a technical person to be able to use this system. 1.7 (0.7) 1.7 (1.2)
5. I found the various functions in this system were well integrated. 3.6 (1.0) 3.4 (1.3)
6. I thought there was too much inconsistency in this system. 3.1 (1.6) 3.4 (1.6)
7. I would imagine that most people would learn to use this system very quickly. 4.4 (1.2) 4.1 (1.2)
8. I found the system very cumbersome to use. 1.6 (1.3) 1.3 (0.8)
9. I felt very confident using the system. 4.4 (1.1) 4.6 (0.6)
10. I needed to learn a lot of things before I could get going with this system. 1.7 (0.9) 2.2 (1.6)

In Table 1, the questions in italics are negatively worded, with a lower score on the 1–5 scale representing better usability. For example, the best usability score for a negatively-worded question is 1 which would be equivalent to a 5 on a positively-worded question. The average for all questions were all quite stable, with scores for week four all falling within one standard deviation of the corresponding score for week one.

For the odd numbered questions where higher scores representing better usability, the ease of use, learnability, and confidence in the system received average ratings between 4 and 5. Participants’ predicted use of the system and views on system integration received average ratings between 3 and 4 and no questions received average ratings below 3. For the even numbered questions where lower scores represent better usability, four questions received good average ratings between 1 and 2, including perceptions that the system was too complex, cumbersome, or difficult to learn, and perceived need for technical support. System inconsistency was viewed more poorly, with scores over 3 at both time points.

Figure 1 represents a visual method for visualizing overall SUS scores, displaying the distributions of overall scores at weeks one and four. This figure shows that the shape of the distribution changed between weeks one and four. The distribution of scores at week one appears bimodal, with peaks in the categories “Poor” and “Excellent”. The distribution of scores at week four is more uniformly distributed across the “Poor”, “Good”, and “Excellent” SUS categories.

Figure 1:

Figure 1:

Distributions of overall SUS scores at weeks one and four

Figures 2 and 3 are slope charts showing the changes in individual participants’ SUS scores between weeks one and four. Figure 2 shows the overall score changes and Figure 3 shows the question-level changes. The lines are colored based on whether the individual’s score stayed the same (gray), whether it improved (blue), or whether it got worse (red). For negatively worded questions, lines are red if they increased from week one to week four. The line thickness represents the number of participants represented by the line. The shaded areas denote the descriptive categories associated with the SUS score grade ranges. No scores fell in the ‘Awful’ range (0–51), so we omitted that range on the charts to compare individuals’ changes more clearly.

Figure 2:

Figure 2:

Overall changes in SUS scores

Figure 3:

Figure 3:

Question-level changes in SUS scores

While the numerical scores in Table 1 were stable over time, these slope charts show that many individuals had quite dramatic changes (steep slopes) in their SUS scores from week one to week four. These changes happened in both directions, making the aggregate SUS scores appear stable between weeks one and four.

DISCUSSION

By using the histogram distributions and slope graph visualizations, we were able to see that the lack of change in average and standard deviation SUS scores was misleading. If we purely looked at the overall numerical scores in Table 1, our assessment would be that the system usability was “Good” at both time points. If we look at the question-level average and standard deviation values in Table 1, we could also conclude that the SUS scores were stable between time points. Yet, there was significant variability in participants’ usability assessments across questions and between time points. The slope chart visualizations were useful for understanding the range of participants’ baseline scores and the magnitudes of changes between time points. In addition to cross-participant variability, there was also substantial variability across questions.

Slope charts, though not named as such, were proposed by Tufte in his landmark book “The Visual Display of Quantitative Information” as a valuable type of visualization.23 This minimalist, data-rich type of chart aligns with Tufte’s idea of creating visualizations with high data-to-ink ratios.23 Tufte proposed that this type of visualization has many benefits, which we have re-worded below to match our data set:

  • Displays the hierarchy of SUS scores from highest to lowest at each time point

  • Allows a user to identify specific SUS scores associated with each line

  • Shows how one individual’s SUS score changes over time (line slope)

  • Shows how one individual’s rate of change in their SUS score compares to other individuals’ rates of change (the line slopes compared with one another)

  • Allows the viewer to identify any notable deviations in general trends (aberrant line slopes)

Our slope charts also take advantage of the Gestalt principle of similarity, meaning that things sharing visual characteristics - in this case the colors red, blue, and gray - are seen as belonging together.24 Finally, we chose the colors red and blue to represent good and poor changes in SUS scores. Red and green should not be used together, to accommodate those with color-blindness. Several studies have assessed individuals’ cognition as it relates to the perception of the colors red and blue.25 These studies have shown that (at least in North American cultures) individuals completing cognitive tasks, which interpretation of data visualizations could be considered, tend to interpret red as relating to dangers and mistakes and blue as relating to openness and peace.26

This study included data for fourteen participants, so variation in SUS scores may decrease with a larger sample of participants. Our future work will include a qualitative analysis of participants’ exit interview data to understand why there was such a large variation in individuals’ SUS scores across questions and time points. This qualitative analysis will help us better understand what changes we should make to the intervention, and what changes must be tailored to subgroups of participants.

CONCLUSION

This work provides guidance to researchers conducting usability assessments of mHealth interventions. Our findings suggest that SUS scores should be assessed at multiple time points, as participants may have substantial changes in their assessments of the system. Our findings also suggest that reporting mean SUS scores for these times points, even if they include standard deviations, is insufficient. Slope chart visualizations can provide insight into the participant and question-level data underlying these numerical values.

KEY POINTS.

  • It is important to measure mobile health (mHealth) usability, and the System Usability Scale (SUS) is a common method.

  • Aggregate mean and standard deviation SUS scores provide an incomplete picture of mHealth usability.

  • SUS scores measured at multiple time points may appear to be stable if only mean and standard deviation values are reported, but slope chart visualizations can illuminate dramatic individual differences and variations in scores across participants and over time.

Conflicts of Interest and Source of Funding:

Research reported in this publication was supported by the National Institute of Nursing Research of the National Institutes of Health under Award Number P20NR016599. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

CONFLICTS OF INTEREST

The authors have no conflicts of interest to report.

REFERENCES

  • 1.Rock Health. n.d. 2018 year end funding report: Is digital health in a bubble? | Rock Health. [online] Available at: <https://rockhealth.com/insights/2018-year-end-funding-report-is-digital-health-in-a-bubble/> [Accessed 22 March 2022]. [Google Scholar]
  • 2.Roth T. Insomnia: definition, prevalence, etiology, and consequences. J Clin Sleep Med. 2007;3(5 Suppl):S7–S10. [PMC free article] [PubMed] [Google Scholar]
  • 3.Morin CM, LeBlanc M, Daley M, Gregoire JP, Mérette C. Epidemiology of insomnia: prevalence, self-help treatments, consultations, and determinants of help-seeking behaviors. Sleep Med. 2006;7(2):123–130. doi: 10.1016/j.sleep.2005.08.008 [DOI] [PubMed] [Google Scholar]
  • 4.Ohayon MM. Epidemiology of insomnia: what we know and what we still need to learn. Sleep Med Rev. 2002;6(2):97–111. doi: 10.1053/smrv.2002.0186 [DOI] [PubMed] [Google Scholar]
  • 5.Bhaskar S, Hemavathy D, Prasad S. Prevalence of chronic insomnia in adult patients and its correlation with medical comorbidities. J Family Med Prim Care. 2016;5(4):780–784. doi: 10.4103/2249-4863.201153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ancoli-Israel S. Sleep and aging: prevalence of disturbed sleep and treatment considerations in older adults. J Clin Psychiatry. 2005;66 Suppl 9:24–43. [PubMed] [Google Scholar]
  • 7.Ohayon MM, Carskadon MA, Guilleminault C, Vitiello MV. Meta-analysis of quantitative sleep parameters from childhood to old age in healthy individuals: developing normative sleep values across the human lifespan. Sleep. 2004;27(7):1255–1273. doi: 10.1093/sleep/27.7.1255 [DOI] [PubMed] [Google Scholar]
  • 8.Taylor DJ, Mallory LJ, Lichstein KL, Durrence HH, Riedel BW, Bush AJ. Comorbidity of chronic insomnia with medical problems [published correction appears in Sleep. 2007 Jul 1;30(7):table of contents]. Sleep. 2007;30(2):213–218. doi: 10.1093/sleep/30.2.213 [DOI] [PubMed] [Google Scholar]
  • 9.Mück JE, Ünal B, Butt H, Yetisen AK. Market and Patent Analyses of Wearables in Medicine. Trends Biotechnol. 2019;37(6):563–566. doi: 10.1016/j.tibtech.2019.02.001 [DOI] [PubMed] [Google Scholar]
  • 10.Peake JM, Kerr G, Sullivan JP. A Critical Review of Consumer Wearables, Mobile Applications, and Equipment for Providing Biofeedback, Monitoring Stress, and Sleep in Physically Active Populations. Front Physiol. 2018;9:743. Published 2018 Jun 28. doi: 10.3389/fphys.2018.00743 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Liang Z, Martell MA. Validity of consumer activity wristbands and wearable EEG for measuring overall sleep parameters and sleep structure in free-living conditions. Journal of Healthcare Informatics Research. 2018;2(1–2):152–178. doi: 10.1007/s41666-018-0013-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Mantua J, Gravel N, Spencer RM. Reliability of Sleep Measures from Four Personal Health Monitoring Devices Compared to Research-Based Actigraphy and Polysomnography. Sensors (Basel). 2016;16(5):646. Published 2016 May 5. doi: 10.3390/s16050646 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Brooke J SUS-A quick and dirty usability scale. Usability evaluation in industry. 1996; 189(194):4–7. [Google Scholar]
  • 14.Yen PY, Wantland D, Bakken S. Development of a Customizable Health IT Usability Evaluation Scale. AMIA Annu Symp Proc. 2010;2010:917–921. Published 2010 Nov 13. [PMC free article] [PubMed] [Google Scholar]
  • 15.Davis FD. A technology acceptance model for empirically testing new end-user information systems: Theory and results (Doctoral dissertation, Massachusetts Institute of Technology; ). 1986 [Google Scholar]
  • 16.Venkatesh V, Morris MG, Davis GB, Davis FD. User acceptance of information technology: Toward a unified view. MIS quarterly. 2003;Sep 1:425–78. [Google Scholar]
  • 17.Schnall R, Cho H, Liu J. Health Information Technology Usability Evaluation Scale (Health-ITUES) for Usability Assessment of Mobile Health Technology: Validation Study. JMIR Mhealth Uhealth. 2018;6(1):e4. Published 2018 Jan 5. doi: 10.2196/mhealth.8851 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Cornet VP, Daley CN, Srinivas P, Holden RJ. User-Centered Evaluations with Older Adults: Testing the Usability of a Mobile Health System for Heart Failure Self-Management. Proc Hum Factors Ergon Soc Annu Meet. 2017;61(1):6–10. doi: 10.1177/1541931213601497 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Isaković M, Sedlar U, Volk M, Bešter J. Usability Pitfalls of Diabetes mHealth Apps for the Elderly. J Diabetes Res. 2016;2016:1604609. doi: 10.1155/2016/1604609 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Holden RJ, Campbell NL, Abebe E, et al. Usability and feasibility of consumer-facing technology to reduce unsafe medication use by older adults. Res Social Adm Pharm. 2020;16(1):54–61. doi: 10.1016/j.sapharm.2019.02.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Alwashmi MF, Hawboldt J, Davis E, Fetters MD. The Iterative Convergent Design for Mobile Health Usability Testing: Mixed Methods Approach. JMIR Mhealth Uhealth. 2019;7(4):e11656. Published 2019 Apr 26. doi: 10.2196/11656 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Morey SA, Barg-Walkow LH, Rogers WA. Managing Heart Failure On the Go: Usability Issues with mHealth Apps for Older Adults. Proceedings of the Human Factors and Ergonomics Society Annual Meeting. 2017;61(1):1–5. doi: 10.1177/1541931213601496 [DOI] [Google Scholar]
  • 23.Tufte ER. The visual display of quantitative information. Cheshire, CT: Graphics press; 2001. [Google Scholar]
  • 24.Wagemans J, Elder JH, Kubovy M, et al. A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization. Psychol Bull. 2012;138(6):1172–1217. doi: 10.1037/a0029333 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Mehta R, Zhu RJ. Blue or red? Exploring the effect of color on cognitive task performances. Science. 2009;323(5918):1226–1229. doi: 10.1126/science.1169144 [DOI] [PubMed] [Google Scholar]

RESOURCES