Towards Enhancing Blind Users’ Interaction Experience with Online Videos via Motion Gestures

Hae-Na Lee; Vikas Ashok

doi:10.1145/3465336.3475116

. Author manuscript; available in PMC: 2022 Mar 8.

Published in final edited form as: HT ACM Conf Hypertext Soc Media. 2021 Aug;2021:231–236. doi: 10.1145/3465336.3475116

Towards Enhancing Blind Users’ Interaction Experience with Online Videos via Motion Gestures

Hae-Na Lee ¹, Vikas Ashok ²

PMCID: PMC8903014 NIHMSID: NIHMS1777421 PMID: 35265946

Abstract

Blind users interact with smartphone applications using a screen reader, an assistive technology that enables them to navigate and listen to application content using touch gestures. Since blind users rely on screen reader audio, interacting with online videos can be challenging due to the screen reader audio interfering with the video sounds. Existing solutions to address this interference problem are predominantly designed for desktop scenarios, where special keyboard or mouse actions are supported to facilitate ‘silent’ and direct access to various video controls such as play, pause, and progress bar. As these solutions are not transferable to smartphones, suitable alternatives are desired. In this regard, we explore the potential of motion gestures in smartphones as an effective and convenient method for blind screen reader users to interact with online videos. Specifically, we designed and developed YouTilt, an Android application that enables screen reader users to exploit an assortment of motion gestures to access and manipulate various video controls. We then conducted a user study with 10 blind participants to investigate whether blind users can leverage YouTilt to properly execute motion gestures for video-interaction tasks while simultaneously listening to video sounds. Analysis of the study data showed a significant improvement in usability by as much as 43.3% (avg.) with YouTilt compared to that with default screen reader, and overall a positive attitude and acceptance towards motion gesture-based video interaction.

Keywords: Mobile Interaction, Motion Gesture, Smartphone, Video, Accessibility, Screen Reader, Blind, Visually Impaired

CCS CONCEPTS: • Human-centered computing → Empirical studies in accessibility, Accessibility technologies, Empirical studies in interaction design

1. INTRODUCTION

The consumption of online videos on smartphones for purposes of entertainment, education, information, etc., has grown substantially over the past few years [8, 25]. To interact with videos on smartphones, blind users predominantly rely on smartphone assistive technology, namely a screen reader such as VoiceOver [1] or TalkBack [14], which enables them to use either predefined touch gestures or free-form touch exploration to access and manipulate various video controls (e.g., play, pause, progress bar, playback speed) by listening to the control labels. However, this interaction with videos is presently well-known to be both disruptive and inconvenient since the screen reader speech sounds will obscure and interfere with the video sounds [20, 21].

Prior solutions to address this sound interference issue are predominantly tailored for video interaction on desktops, where special keyboard shortcuts or mouse actions are provided to enable instant access to the various video controls [20, 26]. This way, blind users can ‘quietly’ and directly manipulate these controls without having to rely on screen reader shortcuts for serially navigating to the controls while listening to page content along the way – the primary source of the sound interference usability problem. However, these effective solutions are not suitable for smartphones, due to the obvious absence of keyboard or mouse input modalities; the blind users instead rely on touch-based screen reader gestures (e.g., swipes) to interact with applications including videos. Therefore, suitable smartphone-specific alternatives to these extant solutions are desired; motion gestures can potentially fill this void by providing an alternative ‘sound-free’ input modality for blind screen reader users to directly access and manipulate various video controls in the smartphone environment. In fact, motion gestures have been extensively researched as an input modality for sighted users to do various application tasks on handheld devices or smartphones [2, 3, 11, 23, 27, 28]. Therefore, in our work, we investigate if blind users too can benefit from motion gestures in the context of video interaction, and if so, what their requirements and preferences are in this regard.

As a vehicle for our investigation, we designed an Android application YouTilt containing the YouTube player, which lets blind screen reader users access various video controls ‘silently’ using a variety of motion gestures (see Table 1), thereby enabling the users to solely focus on the video sound without any interference from screen reader speech. YouTilt also provides short vibration feedback to notify the blind users about successful executions of gestures. Furthermore, YouTilt only augments the existing set of default touch gestures supported by a screen reader with motion gestures, and does not replace them; the users can still exploit the default screen reader touch gestures based on their preferences. We evaluated YouTilt in a pilot user study with 10 fully blind smartphone users. The study yielded many insights, most notably that except for a few types of motion gestures, the participants were able to execute all other motion gestures with ease and reasonably high accuracy. All participants also explicitly stated that they preferred motion gestures over screen reader touch gestures to interact with online videos. We summarize our contribution as follows: (i) a preliminary prototype, namely YouTilt, that enables blind screen reader users to instantly and silently manipulate video controls using the motion gestures, while simultaneously listening to the video content; and (ii) the findings of a pilot study with 10 blind participants aimed at assessing the feasibility of motion gestures for non-visual video interaction, as well as eliciting user needs and requirements in this regard.

Table 1:

Motion gestures and the corresponding video controls in YouTilt. A finger should be placed on the screen as a gesture delimiter before executing the motion gestures.

Player control/option	Motion gesture
Play/Pause	Shake
Rewind	Tilt left
Fast-forward	Tilt right
Volume up	Tilt upward
Volume down	Tilt downward
Previous video	Move to left and come back
Next video	Move to right and come back
Quality	Tilt top left
Playback speed	Tilt top right
Mute/Unmute	Tilt bottom left
Current timestamp	Tilt bottom right

Open in a new tab

2. RELATED WORK

While there exist a few research works addressing the usability of video players for people with visual impairments [10, 12, 20–22, 26], most of them have primarily focused on desktop interaction, especially the web platform. For instance, Miyashita et al. [20] proposed a custom interface called aiBrowser, which enables blind users to leverage special keyboard hotkeys to interact with video controls. Similarly, Villena et al. [26] presented a custom-designed accessible video player for people with vision impairments, that provides accessible basic player controls, as well as other novel functionalities such as toolbar configuration, light, and timeline annotation. Moreno et al. [22] also suggested an accessible HTML5 media player, which satisfies accessibility requirements specified in W3C standards. All these works rely on adapting keyboard hotkeys and therefore are not applicable for smartphone video interaction.

Smartphone accessibility for people who are blind is a well-researched topic [4, 6, 7, 13, 17–19]. These works have proposed novel interaction methods which are convenient for blind users. For example, Slide Rule [17] presented a novel non-visual interaction method that enables blind users to rely on multi-touch gestures such as one-finger scan, second-finger tap, flick, and L-select gestures to access smartphone applications. Similarly, No-Look Notes [4] utilizes multi-touch input to enable eyes-free text entry on mobile phones. Kane et al. [18] also conducted user studies to understand blind users’ gesture preferences and performances on touchscreen, and proposed guidelines for designing accessible touchscreen gestures. A few approaches leveraging motion gestures have also been explored to assist blind users’ interaction with mobile devices [9, 10, 13]. For example, Dim et al. [10] proposed a motion gesture interface for accessing phone books and making phone calls. Their results showed that the efficiency of doing tasks was much higher with motion gestures compared to that with traditional button-based interfaces. Tilt gestures were also used to design a mobile museum guide PDA for blind users [13]. A common aspect underlying all these works is that they were designed mostly for simple discrete (i.e., Yes/No) selection tasks. On the contrary, our work is focused on video interaction, a relatively more complex scenario wherein blind users have to manipulate both discrete (e.g., play/pause) and continuous controls (e.g., progress bar), while simultaneously concentrating on the video sounds.

3. THE DESIGN OF YOUTILT

As an investigation prototype, YouTilt was implemented as a custom Android application containing the YouTube video player. We exploited the YouTube IFrame player API [15] for this purpose as it supports all video controls present in the YouTube media player. The motion gestures for accessing and manipulating these controls are listed in Table 1. The choice of gestures was influenced by prior works [2, 9, 10, 24], which demonstrated their effectiveness in general-purpose interaction scenarios like menu selection. A gesture delimiter (i.e., place-and-hold a finger on the screen) was required by YouTilt for effectively separating intended gestures and unintended hand movements. We also added short vibratory feedback to each gesture (using the Android’s Vibrator library¹) for notifying the successful execution of the gesture to a blind user. More design details are provided next.

Play/Pause.

The video content can be played or paused at any time via a simple shake gesture. For robust detection, we set the shake-count threshold to 2, i.e., a user has to quickly shake the smartphone at least twice for YouTilt to register the gesture. As mentioned earlier, YouTilt provides short vibratory feedback after successfully detecting this gesture.

Progress bar slider control.

To move the progress bar slider either to rewind or fast-forward a video, the user can place-and-hold a finger on the screen and then perform tilt left/right gestures respectively. In the current implementation, each tilt shifts the slider backward or forward by 10 seconds. The user can also tilt-and-hold to keep shifting the slider by 10 seconds in the progress bar every second as long as the user maintains the tilt position. Furthermore, YouTilt supports three different shifting speeds (10/20/30-second jumps) based on the angle/extent (12°/24°/36°) of tilt, which enables the user to employ different speeds for slider displacements, e.g., greater the right tilt, higher the fast-forward amount in the progress bar. Short vibratory feedback is also provided after every change in the shifting speed of slider during the tilting action.

Volume.

When using TalkBack in the YouTube app, pressing side volume buttons on Android smartphones brings up two volume controls for selection, one for the screen reader audio and the other for the video sound. Therefore, using these buttons while listening to a video is disruptive as the user needs to then manually select the desired volume control before adjusting it. Hence, YouTilt supports tilt up/down gestures to directly increase/decrease only the video volume without affecting the screen reader volume. In the current implementation, each tilt increases/decreases volume by 5 units. The user can also optionally tilt-and-hold wherein the volume increases/decreases by 5 units every second as long as the user maintains the tilt position. The user can also mute/unmute using the bottom-left tilt gesture.

Previous/Next video.

For these controls, the user can rely on move gestures. The internal thresholds for time and displacement to trigger these gestures were set to 1 second and 5 cm, respectively, i.e., the user has to move the phone left/right and back within 1 second, and also the amount of displacement should be at least 5 cm to successfully execute these gestures.

Settings.

YouTilt also supports gestures for instantly configuring the video quality and playback speed settings. To activate these controls, the user has to first perform the top diagonal tilt gestures (see Table 1), which bring up the corresponding lists containing different options or values (e.g., 480p, 720p, 1080p for video quality) for these controls. The user can then use the tilt up/down gestures to select the next/previous option in these lists. Note that short vibratory feedback is also provided after successful execution of each gesture.

Note that all gesture parameters were decided after discussions and trial runs with our accessibility consultant who is completely blind (since birth) and a power user of smartphone screen readers including TalkBack.

4. PILOT USER STUDY

4.1. Participants and Apparatus

We recruited 10 participants via word of mouth and mailing lists. Gender representation was equal (5 female, 5 male), and the average age of participants was 39.1 (Median = 39.5, Min = 25, Max = 53). Since YouTilt was developed for the Android platform, our inclusion criteria required proficiency with Android smartphones and the default TalkBack screen reader available on this platform. Therefore, all participants owned Android phones and were proficient in using TalkBack. All participants were either blind by birth or lost their vision at very young age (age less than 10). Also, all participants stated that they frequently interacted with online videos, specifically on YouTube, for the purposes of entertainment, education, and information. The participants used their own Android smartphones for the study. All participants had YouTube app already installed, and the experimenter installed YouTilt application on their phones. None of the participants used headphones or earphones.

4.2. Design and Procedure

In a within-subject experimental setup, the participants were asked to perform the following experimental tasks: (T1) play and pause a video, (T2) navigate (by moving a progress bar slider) to the start of a specified target step in a simple well-known tutorial video after watching the whole video, (T3) adjust the volume of a video, (T4) adjust the video quality, (T5) adjust the playback speed, (T6) go to the next video, (T7) mute the video, and (T8) read the current progress bar slider location. The participants performed these tasks under the following two conditions: (1) TalkBack - the participants used only the TalkBack default touch gestures to interact with the YouTube videos; and (2) YouTilt - the participants used motion gestures to interact with the YouTube videos.

We selected 4 tutorial videos, and the participants performed all the tasks for each video, with 2 videos per condition. The assignment of videos to conditions and the ordering of tasks and conditions were counterbalanced for each participant. The videos were selected to be similar and of nearly equal duration. Also, for task T2, the target steps for the videos were selected in such a way that their corresponding progress bar locations in the videos were approximately similar, i.e., close to the middle of the videos. As the YouTilt app used the YouTube IFrame player API, the video players in both study conditions were the same, i.e., having the same controls in the exact same order.

The experimenter started the study by giving the participants enough practice time (~ 15 minutes) to get comfortable with using the different motion gestures while interacting with videos. The participants were also given time to refresh and practice TalkBack touch gestures for interacting with the YouTube videos. The participants were then asked to complete the tasks in a predetermined randomized order. Post-study interviews were conducted to elicit subjective feedback and feature requests, as well as the perceived usability in the form of SUS questionnaire [5] and the perceived task workload in the form of NASA-TLX questionnaire [16]. The study was also recorded with participants’ permission, and then subsequently analyzed to determine any peculiar user behavior in the study conditions.

4.3. Results

Task performance.

Table 2 presents the performance statistics. The success rate metric captures the percentage of successful completions by the participants for each task, whereas the number of actions metric indicates the average number of user input actions (touch/motion gestures) performed by the participants in the successful task completions. As seen in Table 2, the success rates of YouTilt were relatively better than TalkBack for all tasks except T5 and T7. In tasks T4, T5, T7, and T8, with YouTilt, the participants struggled to perform the diagonal tilt gestures, and correspondingly we observed low success rates (more details in the next section). Also, none of the participants could successfully complete task T8 with TalkBack and therefore the success rate for that task was 0.

Table 2:

Performance statistics for both conditions in the user study. Success rate corresponds to successful completion percentage of a task. Number of actions corresponds to the number of user input actions (touch or motion gestures) made by a participant to complete the task.

Task	Description	TalkBack		YouTilt

		Success rate	Number of actions	Success rate	Number of actions

T1	Play/Pause	0.9	μ = 1.35, σ = 0.47	0.9	μ = 1.25, σ = 0.43

T2	Rewind/Fast-forward	0.55	μ = 27.75, σ = 11.44	0.95	μ = 7.60, σ = 2.03

T3	Volume	0.55	μ = 19.45, σ = 6.04	1.0	μ = 3.65, σ = 1.38

T4	Video quality	0.3	μ = 25.30, σ = 11.44	0.55	μ = 11.25, σ = 4.80

T5	Playback speed	0.5	μ = 24.75, σ = 5.07	0.3	μ = 18.50, σ = 6.22

T6	Next video	0.7	μ = 8.85, σ = 3.29	1.0	μ = 2.80, σ = 1.40

T7	Mute	0.9	μ = 12.10, σ = 4.82	0.4	μ = 11.10, σ = 5.32

T8	Progress bar location	0.0	-	0.25	μ = 14.05, σ = 5.17

Open in a new tab

Among the successfully completed tasks, overall, the participants needed significantly fewer input actions with YouTilt than with TalkBack to complete each of the tasks. This was because with TalkBack, the participants performed repeated swipe touch gestures and/or two-dimensional touch exploration over the content to navigate and locate the target video controls. Such content navigation to access controls was not needed with YouTilt as there was a dedicated motion gesture for each control; however, there were still a few instances where the participants performed incorrect gestures and therefore had to redo the actions.

Execution of motion gestures.

All participants used one hand (8 - non-dominant hand, 2 - dominant hand) to hold their phones and perform the YouTilt motion gestures. However, for gesture delimiter (i.e., placing a finger on the screen before executing a motion gesture), 6 participants used a finger of the other hand whereas the remaining 4 participants used their thumb on the hand holding the phone. In task T1, all participants (except P3) did not face any issue in completing the task using the shake gesture. The participants also did not face any issues while executing the tilt up/down gestures for adjusting the volume in task T3. They also did not report any discomfort or confusion while doing these tasks, and additionally stated that their concentration on the multimedia content was not disrupted while performing these gestures.

While doing task T2, at the beginning, almost all participants (except P3) executed exaggerated right-tilt gestures that resulted in large jumps of progress bar slider along the progress bar. However, they quickly switched to lower jumps, and subsequently did more measured and subtle tilt-and-hold gestures by relying on the short vibratory feedback. Regarding the levels, four participants (P1, P2, P4, and P6) mostly used level 2 (i.e., fast-forward by 20s), whereas the remaining participants mostly settled for the default level 1 (i.e., 10s). Only one participant (P5) overshot the target progress bar location and had to rewind back to it; all other 9 participants used the gestures to move the slider as close to the target location as possible, and then simply listened to rest of the video content until reaching the target at which instant they paused using the shake gesture. All participants indicated that the left/right tilt gestures affected their concentration only by a tiny amount, as they had to divert a little attention to not overdo these gestures.

In task T6, all participants faced slight difficulties in properly executing the movement gesture. Specifically, most failed executions were due to participants either not being quick enough or not moving the phone far enough to beat the preset thresholds for gesture detection. A majority (80%) of participants faced issues while executing the diagonal tilt gestures for tasks T4, T5, T7, and T8. While trying to execute these gestures, they unintentionally ended up doing either the left/right tilts for progress bar slider movement or up/down tilts for adjusting the volume. These participants complained of frustration, and most of them gave up after a few attempts, which explains the low success rate for these tasks as shown in Table 2. These participants also stated that they could not concentrate on the video content while repeatedly trying to successfully execute these diagonal motion gestures.

Perceived usability and workload.

The average System Usability Scale (SUS) score for TalkBack was 54.25 (Median = 53.75, Min = 42.5, Max = 67.5), which was much lower than the average score 77.75 for YouTilt (Median = 77.5, Min = 70, Max = 90). Analysis of the responses to the individual Likert items of the SUS questionnaire revealed that the participants especially found the YouTilt much simpler, easy-to-use, and well-integrated, compared to just TalkBack. However, the response to the consistency and learnability aspects of YouTilt were lukewarm, thereby indicating the need for further research regarding these aspects. As for perceived task workload, the average NASA-TLX score for TalkBack was 73.63 (Median = 76.33, Min = 62.66, Max = 84), which was much higher than the average score 47.53 for YouTilt (Median = 45.16, Min = 38.66, Max = 58). The factors most contributing to the significant difference in TLX scores between the study conditions were mental demand, effort, and frustration.

Qualitative feedback – suggestions and feature requests.

All participants stated that most of the motion gestures in YouTilt were quick and easy to execute, and that they could concentrate better on the video content even while executing the gestures. In contrast, with TalkBack, the participants mentioned that they had to split their attention between two sounds, and that it was especially harder to concentrate on the videos due to the TalkBack audio partially masking the video sounds. They also mentioned that accessing controls was slower with TalkBack due to the absence of direct access to these controls, thereby requiring them to manually navigate to the controls using screen reader actions. All participants except P5 stated that they would like customization support in YouTilt to be able to reassign the motion gestures to the different video controls, based on their personal preferences. Four participants (P1, P2, P4, and P6) also wanted to customize the tilt gestures for progress bar slider control, by manually defining the number of levels and the unit value (e.g., fast-forward amount) for each level.

Also, a majority (7) of participants explicitly stated that they did not want the diagonal tilt gestures to be included in YouTilt. When probed, they explained that they were mostly used to only the four basic directional movements, and that it was harder for them to perceive or sense the diagonal directions. Three of these seven participants emphasized that this was due to the way they were used to holding their phones. Specifically, these participants mentioned that they typically used both their hands to interact with smartphone applications using a screen reader – the non-dominant hand to hold the phone in a tight grip, and the dominant hand to perform the gestures. They claimed that it was harder for them to sense the diagonal directions with their non-dominant hands than with their dominant hands. All participants agreed that the vibratory feedback was very important in assisting them properly execute the motion gestures. Specifically, they explained that the vibratory feedback helped them realize and recover from ‘accidental’ gestures while playing videos; without this feedback, it would have been difficult for them to determine if a problem occurred due to a playback issue (e.g., buffering) or an unintentional execution of a gesture. Three participants (P2, P3, and P6) also mentioned that the vibratory feedback provided some psychological assurance that helped them concentrate more on the video rather than on executing gestures.

5. DISCUSSION

While the study revealed promising results overall regarding the use of motion gestures for video interaction, it also illuminated the present limitations and unique requirements of blind screen reader users as discussed next.

Personalization of motion gestures.

The study findings indicate that blind screen reader users have unique individual needs and preferences regarding motion gestures, and therefore an inflexible ‘one-size-fits-all’ approach is less likely to be adopted by these users. Specifically, the participants wanted fine-grained control over the assignment of gestures to video controls and also the adjustment of gesture parameters (e.g., fast-forward amount). Supporting customization in YouTilt involves an additional challenge of designing and developing the corresponding customization interfaces that must be both accessible and easy-to-use, and is hence the scope of our future research.

Limitations.

Apart from the lack of interface personalization, the support provided by current YouTilt prototype is limited to video controls, and thus it does not cover other important YouTube application features such as searching videos, navigating a list of videos, and accessing video-relevant information (comments, video description, and statistics, etc.). We plan to include support for these additional features in the next version of YouTilt. Another limitation of our work is the small sample size of the pilot study. While some of the important findings (e.g., difficulty in performing diagonal tilts, desire for personalization) are likely to generalize over larger population samples, other observations in favor of tilt gestures will have to be further vetted in larger user studies.

6. CONCLUSION

In this paper, we investigated how blind screen reader users can benefit from using motion gestures while interacting with online videos. In this regard, we designed the YouTilt application that enables blind users to access and manipulate YouTube video controls ‘silently’ and directly without using the disruptive screen reader gestures. In a pilot study with 10 fully blind participants, YouTilt produced a better interaction experience overall, compared to the default TalkBack screen reader on Android smartphones. However, the study also highlighted several aspects of YouTilt that need to be improved for further enhancing the usability of motion gestures for blind users to interact with online videos.

ACKNOWLEDGMENTS

This work was supported by NSF awards 1805076, 1936027, 2113485, and NIH awards R01EY030085, R01HD097188.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

Visual-Meta Appendix

The data below is what we call Visual-Meta. It is an approach to add information about a document to the document itself, on the same level of the content (in style of BibTeX). It is very important to make clear that Visual-Meta is an approach more than a specific format and that it is based on wrappers. Anyone can make a custom wrapper for custom metadata and append it by specifying what it contains: for example @dublin-core or @rdfs.

The way we have encoded this data, and which we recommend you do for your own documents, is as follows:

When listing the names of the authors, they should be in the format ‘last name’, a comma, followed by ‘first name’ then ‘middle name’ whilst delimiting discrete authors with (‘and’) between author names, like this: Shakespeare, William and Engelbart, Douglas C.

Dates should be ISO 8601 compliant.

Every citable document will have an ID which we call ‘vm-id’. It starts with the date and time the document’s metadata/Visual-Meta was ‘created’ (in UTC), then max first 10 characters of document title.

To parse the Visual-Meta, reader software looks for Visual-Meta in the PDF by scanning the document from the end, for the tag @{visual-meta-end}. If this is found, the software then looks for @{visual-meta-start} and uses the data found between these tags. This was written September 2021. More information is available from https://visual-meta.info for as long as we can maintain the domain.

@{visual-meta-start}

@{visual-meta-header-start}

@visual-meta{version = {1.1},

generator = {ACM Hypertext 21},

organisation = {Association for Computing Machinery},}

@{visual-meta-header-end}

@{visual-meta-bibtex-self-citation-start}

@inproceedings{10.1145/3465336.3475116,

author = {Lee, Hae-Na and Ashok, Vikas},

title = {Towards Enhancing Blind Users’ Interaction Experience with Online Videos via Motion Gestures},

year = {2021},

isbn = {978-1-4503-8551-0},

publisher = {Association for Computing Machinery},

address = {New York, NY, USA},

url = {https://doi.org/10.1145/3465336.3475116},

doi = {10.1145/3465336.3475116},

abstract = { Blind users interact with smartphone applications using a screen reader, an assistive technology that enables them to navigate and listen to application content using touch gestures. Since blind users rely on screen reader audio, interacting with online videos can be challenging due to the screen reader audio interfering with the video sounds. Existing solutions to address this interference problem are predominantly designed for desktop scenarios, where special keyboard or mouse actions are supported to facilitate ‘silent’ and direct access to various video controls such as play, pause, and progress bar. As these solutions are not transferable to smartphones, suitable alternatives are desired. In this regard, we explore the potential of motion gestures in smartphones as an effective and convenient method for blind screen reader users to interact with online videos. Specifically, we designed and developed YouTilt, an Android application that enables screen reader users to exploit an assortment of motion gestures to access and manipulate various video controls. We then conducted a user study with 10 blind participants to investigate whether blind users can leverage YouTilt to properly execute motion gestures for video-interaction tasks while simultaneously listening to video sounds. Analysis of the study data showed a significant improvement in usability by as much as 43.3% (avg.) with YouTilt compared to that with default screen reader, and overall a positive attitude and acceptance towards motion gesture-based video interaction.},

numpages = {6},

keywords = {Mobile Interaction; Motion Gesture; Smartphone; Video; Accessibility; Screen Reader; Blind; Visually Impaired},

location = {Virtual Event, USA},

series = {HT ‘21},

vm-id = {10.1145/3465336.3475116} }

@{visual-meta-bibtex-self-citation-end}

@{visual-meta-end}

Footnotes

https://developer.android.com/reference/android/os/Vibrator

ACM Reference Format:

Hae-Na Lee and Vikas Ashok. 2021. Towards Enhancing Blind Users’ Inter-action Experience with Online Videos via Motion Gestures. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media (HT ’21), August 30-September 2, 2021, Virtual Event, Ireland. ACM, New York, NY, USA 6 pages. https://doi.org/10.1145/3465336.3475116

Contributor Information

Hae-Na Lee, Stony Brook University Stony Brook, NY, USA.

Vikas Ashok, Old Dominion University Norfolk, VA, USA.

REFERENCES

[1].Apple. 2021. Accessibility - Vision - Apple. https://www.apple.com/accessibility/vision/
[2].Baglioni Mathias, Lecolinet Eric, and Guiard Yves. 2011. JerkTilts: Using Accelerometers for Eight-Choice Selection on Mobile Devices. In Proceedings of the 13th International Conference on Multimodal Interfaces (Alicante, Spain) (ICMI ‘11). Association for Computing Machinery, New York, NY, USA, 121–128. 10.1145/2070481.2070503 [DOI] [Google Scholar]
[3].Bartlett JF 2000. Rock ‘n’ Scroll is here to stay [user interface]. IEEE Computer Graphics and Applications 20, 3 (2000), 40–45. 10.1109/38.844371 [DOI] [Google Scholar]
[4].Bonner Matthew N., Brudvik Jeremy T., Abowd Gregory D., and Edwards W. Keith. 2010. No-Look Notes: Accessible Eyes-Free Multi-touch Text Entry. In Pervasive Computing, Floréen Patrik, Krüger Antonio, and Spasojevic Mirjana (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 409–426. [Google Scholar]
[5].Brooke John et al. 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry 189, 194 (1996), 4–7. [Google Scholar]
[6].Buzzi Maria Claudia, Buzzi Marina, Donini Francesco, Leporini Barbara, and Paratore Maria Teresa. 2013. Haptic Reference Cues to Support the Exploration of Touchscreen Mobile Devices by Blind Users. In Proceedings of the Biannual Conference of the Italian Chapter of SIGCHI (Trento, Italy) (CHItaly ‘13). Association for Computing Machinery, New York, NY, USA, Article 28, 8 pages. 10.1145/2499149.2499156 [DOI] [Google Scholar]
[7].Buzzi Maria Claudia, Buzzi Marina, Leporini Barbara, and Trujillo Amaury. 2014. Designing a Text Entry Multimodal Keypad for Blind Users of Touchscreen Mobile Phones. In Proceedings of the 16th International ACM SIGACCESS Conference on Computers & Accessibility (Rochester, New York, USA) (ASSETS ‘14). Association for Computing Machinery, New York, NY, USA, 131–136. 10.1145/2661334.2661354 [DOI] [Google Scholar]
[8].de Oliveira Rodrigo, Pentoney Christopher, and Pritchard-Berman Mika. 2018. YouTube Needs: Understanding User’s Motivations to Watch Videos on Mobile Devices. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (Barcelona, Spain) (MobileHCI ‘18). Association for Computing Machinery, New York, NY, USA, Article 36, 11 pages. 10.1145/3229434.3229448 [DOI] [Google Scholar]
[9].Dim Nem Khan, Kim Kibum, and Ren Xiangshi. 2018. Designing motion marking menus for people with visual impairments. International Journal of HumanComputer Studies 109 (2018), 79–88. 10.1016/j.ijhcs.2017.09.002 [DOI] [Google Scholar]
[10].Dim Nem Khan and Ren Xiangshi. 2014. Designing motion gesture interfaces in mobile phones for blind people. Journal of Computer Science and technology 29, 5 (2014), 812–824. [Google Scholar]
[11].Eslambolchilar Parisa and Murray-Smith Roderick. 2004. Tilt-Based Automatic Zooming and Scaling in Mobile Devices –A State-Space Implementation. In Mobile Human-Computer Interaction - MobileHCI 2004, Brewster Stephen and Dunlop Mark(Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 120–131. [Google Scholar]
[12].Funes Marcio Maestrelo, Trojahn Tiago Henrique, Fortes Renata Pontin Mattos, and Goularte Rudinei. 2018. Gesture4all: A Framework for 3D Gestural Interaction to Improve Accessibility of Web Videos. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing (Pau, France) (SAC ‘18). Association for Computing Machinery, New York, NY, USA, 2151–2158. 10.1145/3167132.3167363 [DOI] [Google Scholar]
[13].Ghiani Giuseppe, Leporini Barbara, Paternò Fabio, and Santoro Carmen. 2008. Exploiting RFIDs and Tilt-Based Interaction for Mobile Museum Guides Accessible to Vision-Impaired Users. In Computers Helping People with Special Needs, Miesenberger Klaus, Klaus Joachim, Zagler Wolfgang, and Karshmer Arthur (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1070–1077. [Google Scholar]
[14].Google. 2021. Get started on Android with TalkBack - Android Accessibility Help. https://support.google.com/accessibility/android/answer/6283677
[15].Google. 2021. YouTube IFrame Player API. https://developers.google.com/youtube/iframe_api_reference.
[16].Hart Sandra G. and Staveland Lowell E. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Workload Human Mental, Hancock Peter A. and Meshkati Najmedin (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139–183. 10.1016/S01664115(08)62386-9 [DOI] [Google Scholar]
[17].Kane Shaun K., Bigham Jeffrey P., and Wobbrock Jacob O. 2008. Slide Rule: Making Mobile Touch Screens Accessible to Blind People Using Multi-Touch Interaction Techniques. In Proceedings of the 10th International ACM SIGACCESS Conference on Computers and Accessibility (Halifax, Nova Scotia, Canada) (Assets ‘08). Association for Computing Machinery, New: York, NY, USA, 73–80. 10.1145/1414471.1414487 [DOI] [Google Scholar]
[18].Kane Shaun K., Wobbrock Jacob O., and Ladner Richard E. 2011. Usable Gestures for Blind People: Understanding Preference and Performance. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ‘11). Association for Computing Machinery, New York, NY, USA, 413–422. 10.1145/1978942.1979001 [DOI] [Google Scholar]
[19].Krajnc Elmar, Knoll Mathias, Feiner Johannes, and Traar Mario. 2011. A Touch Sensitive User Interface Approach on Smartphones for Visually Impaired and Blind Persons. In Information Quality in e-Health, Holzinger Andreas and Simonic KlausMartin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 585–594. [Google Scholar]
[20].Miyashita Hisashi, Sato Daisuke, Takagi Hironobu, and Asakawa Chieko. 2007. Aibrowser for Multimedia: Introducing Multimedia Content Accessibility for Visually Impaired Users. In Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility (Tempe, Arizona, USA) (Assets ‘07). Association for Computing Machinery, New York, NY, USA, 91–98. 10.1145/1296843.1296860 [DOI] [Google Scholar]
[21].Miyashita Hisashi, Sato Daisuke, Takagi Hironobu, and Asakawa Chieko. 2007. Making Multimedia Content Accessible for Screen Reader Users. In Proceedings of the 2007 International Cross-Disciplinary Conference on Web Accessibility (W4A) (Banff, Canada) (W4A ‘07). Association for Computing Machinery, New York, NY, USA, 126–127. 10.1145/1243441.1243443 [DOI] [Google Scholar]
[22].Moreno Lourdes, Martínez Paloma, Iglesias Ana, and Gonzalez María. 2011. HTML 5 Support for an Accessible User-Video-Interaction on the Web. In Human-Computer Interaction –INTERACT 2011, Campos Pedro, Graham Nicholas, Jorge Joaquim, Nunes Nuno, Palanque Philippe, and Winckler Marco(Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 535–539. [Google Scholar]
[23].Rekimoto Jun. 1996. Tilting Operations for Small Screen Interfaces. In Proceedings of the 9th Annual ACM Symposium on User Interface Software and Technology (Seattle, Washington, USA) (UIST ‘96). Association for Computing Machinery, New York, NY, USA, 167–168. 10.1145/237091.237115 [DOI] [Google Scholar]
[24].Ruiz Jaime, Li Yang, and Lank Edward. 2011. User-Defined Motion Gestures for Mobile Interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ‘11). Association for Computing Machinery, New York, NY, USA, 197–206. 10.1145/1978942.1978971 [DOI] [Google Scholar]
[25].Sandvine. 2020. Global Internet Phenomena Report. https://www.sandvine.com/phenomena
[26].Villena Johana María Rosas, Ramos Bruno Costa, Fortes Renata Pontin M., and Goularte Rudinei. 2014. Web Videos – Concerns About Accessibility based on User Centered Design. Procedia Computer Science 27 (2014), 481–490. 10.1016/j.procs.2014.02.052 5th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Infoexclusion, DSAI 2013. [DOI] [Google Scholar]
[27].Wigdor Daniel and Balakrishnan Ravin. 2003. TiltText: Using Tilt for Text Input to Mobile Phones. In Proceedings of the 16th Annual ACM Symposium on User Interface Software and Technology (Vancouver, Canada) (UIST ‘03). Association for Computing Machinery, New York, NY, USA, 81–90. 10.1145/964696.964705 [DOI] [Google Scholar]
[28].Yeo Hui-Shyong, Phang Xiao-Shen, Castellucci Steven J., Per Ola Kristensson, and Aaron Quigley. 2017. Investigating Tilt-Based Gesture Keyboard Entry for Single-Handed Text Entry on Large Devices. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ‘17). Association for Computing Machinery, New York, NY, USA, 4194–4202. 10.1145/3025453.3025520 [DOI] [Google Scholar]

[R1] [1].Apple. 2021. Accessibility - Vision - Apple. https://www.apple.com/accessibility/vision/

[R2] [2].Baglioni Mathias, Lecolinet Eric, and Guiard Yves. 2011. JerkTilts: Using Accelerometers for Eight-Choice Selection on Mobile Devices. In Proceedings of the 13th International Conference on Multimodal Interfaces (Alicante, Spain) (ICMI ‘11). Association for Computing Machinery, New York, NY, USA, 121–128. 10.1145/2070481.2070503 [DOI] [Google Scholar]

[R3] [3].Bartlett JF 2000. Rock ‘n’ Scroll is here to stay [user interface]. IEEE Computer Graphics and Applications 20, 3 (2000), 40–45. 10.1109/38.844371 [DOI] [Google Scholar]

[R4] [4].Bonner Matthew N., Brudvik Jeremy T., Abowd Gregory D., and Edwards W. Keith. 2010. No-Look Notes: Accessible Eyes-Free Multi-touch Text Entry. In Pervasive Computing, Floréen Patrik, Krüger Antonio, and Spasojevic Mirjana (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 409–426. [Google Scholar]

[R5] [5].Brooke John et al. 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry 189, 194 (1996), 4–7. [Google Scholar]

[R6] [6].Buzzi Maria Claudia, Buzzi Marina, Donini Francesco, Leporini Barbara, and Paratore Maria Teresa. 2013. Haptic Reference Cues to Support the Exploration of Touchscreen Mobile Devices by Blind Users. In Proceedings of the Biannual Conference of the Italian Chapter of SIGCHI (Trento, Italy) (CHItaly ‘13). Association for Computing Machinery, New York, NY, USA, Article 28, 8 pages. 10.1145/2499149.2499156 [DOI] [Google Scholar]

[R7] [7].Buzzi Maria Claudia, Buzzi Marina, Leporini Barbara, and Trujillo Amaury. 2014. Designing a Text Entry Multimodal Keypad for Blind Users of Touchscreen Mobile Phones. In Proceedings of the 16th International ACM SIGACCESS Conference on Computers & Accessibility (Rochester, New York, USA) (ASSETS ‘14). Association for Computing Machinery, New York, NY, USA, 131–136. 10.1145/2661334.2661354 [DOI] [Google Scholar]

[R8] [8].de Oliveira Rodrigo, Pentoney Christopher, and Pritchard-Berman Mika. 2018. YouTube Needs: Understanding User’s Motivations to Watch Videos on Mobile Devices. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (Barcelona, Spain) (MobileHCI ‘18). Association for Computing Machinery, New York, NY, USA, Article 36, 11 pages. 10.1145/3229434.3229448 [DOI] [Google Scholar]

[R9] [9].Dim Nem Khan, Kim Kibum, and Ren Xiangshi. 2018. Designing motion marking menus for people with visual impairments. International Journal of HumanComputer Studies 109 (2018), 79–88. 10.1016/j.ijhcs.2017.09.002 [DOI] [Google Scholar]

[R10] [10].Dim Nem Khan and Ren Xiangshi. 2014. Designing motion gesture interfaces in mobile phones for blind people. Journal of Computer Science and technology 29, 5 (2014), 812–824. [Google Scholar]

[R11] [11].Eslambolchilar Parisa and Murray-Smith Roderick. 2004. Tilt-Based Automatic Zooming and Scaling in Mobile Devices –A State-Space Implementation. In Mobile Human-Computer Interaction - MobileHCI 2004, Brewster Stephen and Dunlop Mark(Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 120–131. [Google Scholar]

[R12] [12].Funes Marcio Maestrelo, Trojahn Tiago Henrique, Fortes Renata Pontin Mattos, and Goularte Rudinei. 2018. Gesture4all: A Framework for 3D Gestural Interaction to Improve Accessibility of Web Videos. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing (Pau, France) (SAC ‘18). Association for Computing Machinery, New York, NY, USA, 2151–2158. 10.1145/3167132.3167363 [DOI] [Google Scholar]

[R13] [13].Ghiani Giuseppe, Leporini Barbara, Paternò Fabio, and Santoro Carmen. 2008. Exploiting RFIDs and Tilt-Based Interaction for Mobile Museum Guides Accessible to Vision-Impaired Users. In Computers Helping People with Special Needs, Miesenberger Klaus, Klaus Joachim, Zagler Wolfgang, and Karshmer Arthur (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1070–1077. [Google Scholar]

[R14] [14].Google. 2021. Get started on Android with TalkBack - Android Accessibility Help. https://support.google.com/accessibility/android/answer/6283677

[R15] [15].Google. 2021. YouTube IFrame Player API. https://developers.google.com/youtube/iframe_api_reference.

[R16] [16].Hart Sandra G. and Staveland Lowell E. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Workload Human Mental, Hancock Peter A. and Meshkati Najmedin (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139–183. 10.1016/S01664115(08)62386-9 [DOI] [Google Scholar]

[R17] [17].Kane Shaun K., Bigham Jeffrey P., and Wobbrock Jacob O. 2008. Slide Rule: Making Mobile Touch Screens Accessible to Blind People Using Multi-Touch Interaction Techniques. In Proceedings of the 10th International ACM SIGACCESS Conference on Computers and Accessibility (Halifax, Nova Scotia, Canada) (Assets ‘08). Association for Computing Machinery, New: York, NY, USA, 73–80. 10.1145/1414471.1414487 [DOI] [Google Scholar]

[R18] [18].Kane Shaun K., Wobbrock Jacob O., and Ladner Richard E. 2011. Usable Gestures for Blind People: Understanding Preference and Performance. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ‘11). Association for Computing Machinery, New York, NY, USA, 413–422. 10.1145/1978942.1979001 [DOI] [Google Scholar]

[R19] [19].Krajnc Elmar, Knoll Mathias, Feiner Johannes, and Traar Mario. 2011. A Touch Sensitive User Interface Approach on Smartphones for Visually Impaired and Blind Persons. In Information Quality in e-Health, Holzinger Andreas and Simonic KlausMartin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 585–594. [Google Scholar]

[R20] [20].Miyashita Hisashi, Sato Daisuke, Takagi Hironobu, and Asakawa Chieko. 2007. Aibrowser for Multimedia: Introducing Multimedia Content Accessibility for Visually Impaired Users. In Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility (Tempe, Arizona, USA) (Assets ‘07). Association for Computing Machinery, New York, NY, USA, 91–98. 10.1145/1296843.1296860 [DOI] [Google Scholar]

[R21] [21].Miyashita Hisashi, Sato Daisuke, Takagi Hironobu, and Asakawa Chieko. 2007. Making Multimedia Content Accessible for Screen Reader Users. In Proceedings of the 2007 International Cross-Disciplinary Conference on Web Accessibility (W4A) (Banff, Canada) (W4A ‘07). Association for Computing Machinery, New York, NY, USA, 126–127. 10.1145/1243441.1243443 [DOI] [Google Scholar]

[R22] [22].Moreno Lourdes, Martínez Paloma, Iglesias Ana, and Gonzalez María. 2011. HTML 5 Support for an Accessible User-Video-Interaction on the Web. In Human-Computer Interaction –INTERACT 2011, Campos Pedro, Graham Nicholas, Jorge Joaquim, Nunes Nuno, Palanque Philippe, and Winckler Marco(Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 535–539. [Google Scholar]

[R23] [23].Rekimoto Jun. 1996. Tilting Operations for Small Screen Interfaces. In Proceedings of the 9th Annual ACM Symposium on User Interface Software and Technology (Seattle, Washington, USA) (UIST ‘96). Association for Computing Machinery, New York, NY, USA, 167–168. 10.1145/237091.237115 [DOI] [Google Scholar]

[R24] [24].Ruiz Jaime, Li Yang, and Lank Edward. 2011. User-Defined Motion Gestures for Mobile Interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ‘11). Association for Computing Machinery, New York, NY, USA, 197–206. 10.1145/1978942.1978971 [DOI] [Google Scholar]

[R25] [25].Sandvine. 2020. Global Internet Phenomena Report. https://www.sandvine.com/phenomena

[R26] [26].Villena Johana María Rosas, Ramos Bruno Costa, Fortes Renata Pontin M., and Goularte Rudinei. 2014. Web Videos – Concerns About Accessibility based on User Centered Design. Procedia Computer Science 27 (2014), 481–490. 10.1016/j.procs.2014.02.052 5th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Infoexclusion, DSAI 2013. [DOI] [Google Scholar]

[R27] [27].Wigdor Daniel and Balakrishnan Ravin. 2003. TiltText: Using Tilt for Text Input to Mobile Phones. In Proceedings of the 16th Annual ACM Symposium on User Interface Software and Technology (Vancouver, Canada) (UIST ‘03). Association for Computing Machinery, New York, NY, USA, 81–90. 10.1145/964696.964705 [DOI] [Google Scholar]

[R28] [28].Yeo Hui-Shyong, Phang Xiao-Shen, Castellucci Steven J., Per Ola Kristensson, and Aaron Quigley. 2017. Investigating Tilt-Based Gesture Keyboard Entry for Single-Handed Text Entry on Large Devices. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ‘17). Association for Computing Machinery, New York, NY, USA, 4194–4202. 10.1145/3025453.3025520 [DOI] [Google Scholar]

PERMALINK

Towards Enhancing Blind Users’ Interaction Experience with Online Videos via Motion Gestures

Hae-Na Lee

Vikas Ashok

Abstract