Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Mar 8.
Published in final edited form as: Proc Symp Appl Comput. 2021 Apr 22;2021:1941–1949. doi: 10.1145/3412841.3442066

Semantic Table-of-Contents for Efficient Web Screen Reading

Javedul Ferdous 1, Sami Uddin 2, Vikas Ashok 3
PMCID: PMC8903019  NIHMSID: NIHMS1777414  PMID: 35265951

Abstract

Navigating back-and-forth between segments in webpages is well-known to be an arduous endeavor for blind screen-reader users, due to the serial nature of content navigation coupled with the inconsistent usage of accessibility enhancing features such as WAI-ARIA landmarks and skip navigation links by web developers. Without these supporting features, navigating modern webpages that typically contain thousands of HTML elements in their DOMs, is both tedious and cumbersome for blind screen-reader users. Existing approaches to improve non-visual navigation efficiency typically propose ‘one-size-fits-all’ solutions that do not accommodate the personal needs and preferences of screen-reader users. To fill this void, in this paper, we present sTag, a browser extension embodying a semi-automatic method that enables users to easily create their own Table Of Contents (TOC) for any webpage by simply ‘tagging’ their preferred ‘semantically-meaningful’ segments (e.g., search results, filter options, forms, menus, etc.) while navigating the webpage. This way, all subsequent accesses to these segments can be made via the generated TOC that is made instantly accessible via a special shortcut or a repurposed mouse/touchpad action. As tags in sTag are attached to the abstract semantic segments instead of actual DOM nodes in the webpage, sTag can automatically generate equivalent TOCs for other similar webpages, without requiring the users to duplicate their tagging efforts from scratch in these webpages. An evaluation with 15 blind screen-reader users revealed that sTag significantly reduced the content-navigation time and effort compared to those with a state-of-the-art solution.

Keywords: Web accessibility, screen reader, personalization

1. INTRODUCTION

To interact with webpages, blind users rely on a screen reader (e.g., JAWS [38], VoiceOver[26], NVDA[2]), a special-purpose assistive technology that enables the users to listen to the webpage content and also linearly navigate the webpage DOM (Document Object Model) to focus on different elements on the webpage. However, many studies have shown that this serial DOM-based navigation of content is both tedious and cumbersome for blind screen-reader users [8, 14, 27], mostly due to the content density (i.e., thousands of HTML elements in the DOM), as well as inconsistent use of accessibility support features by web developers (e.g., WAI-ARIA1 landmarks, skip navigation links, table of contents, etc.). As a consequence, simple web tasks that sighted users can accomplish in a matter of few seconds, usually require a few minutes to complete by blind screen-reader users [8, 14]. For example, while interacting with a typical travel website, navigating back-and-forth between different semantically-meaningful segments (e.g., list of flights, filter options, sort, search form, selection menu, etc.) requires plethora of key presses by blind screen-reader users [8, 12], whereas sighted users can almost instantly switch focus to any segment on the webpage using a ‘point-and-click’ mouse or touchpad.

Extant methods for improving non-visual web-navigation efficiency such as web automation [11, 28], assistants [8, 12, 24], auto annotation [7, 9], etc., are either limited to supporting a few ‘high-level’ repetitive browsing tasks (e.g., ordering a pizza, buying a product, etc.) or are generic ‘one-size-fits-all’ models that strive to support efficient screen-reader navigation. In either case, there is no support for personalized and quick ad-hoc access to different segments on a webpage in arbitrary web browsing scenarios. On the other hand, sighted users can easily and quickly focus back-and-forth between desired segments in any arbitrary webpage. To mitigate this usability divide between sighted and blind users, in this paper, we present sTag, a browser extension that enables blind users to efficiently switch focus between different segments on any webpage at any instant based on their personal needs.

With sTag, blind screen-reader users can create their own custom ‘Table of Contents’ (TOC) for any arbitrary webpage, without relying on any external assistance (see Figure 1). Specifically, the users can leverage sTag provided keyboard shortcuts or reprogrammed mouse actions to explicitly ‘tag’ desired segments (e.g., search results, filter options, forms, menus, etc., in Figure 1), and sTag will automatically generate and add the corresponding skip-links2 for these segments to the custom TOC for that webpage. This way, any subsequent accesses to these tagged segments can be done quickly and directly by accessing the TOC, instead of relying on the tedious manual screen-reader navigation.

Figure 1:

Figure 1:

An illustration of sTag. The custom created table of contents provides instantaneous access to desired webpage segments such as menus, search results, forms, filter groups, etc.

Furthermore, as tagging in sTag is done at the level of semantically-meaningful segments instead of HTML DOM elements, sTag is able to reuse a TOC generated for one webpage to bootstrap the TOCs for other similar webpages, thereby obviating the need for users to duplicate their manual tagging efforts. In a user study with 15 blind participants, sTag significantly reduced the average time to access desired segments while doing representative tasks in both familiar and unfamiliar websites, when compared to a state-of-the-art solution, namely SaIL [9], The participants also rated sTag to be significantly more usable and less stressful than SaIL.

We summarize our contributions as follows:

  • The design and development of a browser extension, namely sTag, that embodies a semi-automatic do-it-yourself approach for creating custom Table of Contents for webpages in order to facilitate faster navigation between the desired webpage segments for blind screen-reader users.

  • The findings of a user study with 15 blind participants, which evaluated the performance of sTag against those of the conventional screen readers as well as a state-of-the-art solution.

2. RELATED WORK

2.1. Extracting Web Semantic Segments

Plenty of prior research works have explored various techniques to identify and extract important segments in webpages [4, 5, 20, 21, 30, 31, 35, 37]. Each of these existing approaches have focused on extracting a specifc type of segment such as data records, menu, search form, main content, news articles, etc. For instance, Alvarez et al. [5] present an approach to extract data records such as search results, query results on shopping, travel, and other e-Commerce websites, etc., by leveraging distinctive repetitive patterns in webpage DOMs such as having the same XPaths, visual formatting, sub-tree structure, etc. Zhu et al. [44] on the other hand rely on hierarchical conditional random fields to detect both data records and their properties (e.g., price, duration, etc.) at the same time. Their approach also relies on the well-known VIPS segmentation algorithm [18] as a pre-processing step.

Melnyk et al. [31] trained machine-learning models to extract widgets such as calendars, auto-suggestion lists, chat windows, etc. Specifically, they capture and classify newly inserted DOM subtrees into one of the widget classes, and for training the classifier they manually engineered custom features. Similarly, [4, 8, 35] present extraction techniques to identify menus, news-article contents, and forms respectively. For sTag, we leverage these existing extraction algorithms to identify and generate skip-links.

2.2. Annotating Webpages for Usability

To improve usability and understandability of webpage content, several works have also focused on adding extra annotations to the source of webpages, for conveying the visual and semantic details explicitly for screen-reader users [6, 9, 10, 17, 40, 42]. For instance, one of the seminal works in this regard [6] consider adding auxiliary HTML nodes containing content-layout information to the webpage DOM, thereby enabling screen-reader users to access this information directly from the webpage. However, the practicality of this approach was found to be unclear due to the high cost of human training involved [10]. Therefore, instead of annotating the main HTML source code of a webpage, Bechhofer et al. [10] investigated the idea of annotating the CSS style sheets, so as to promote reusability, generalizability, and customizability.

As an alternative to annotating webpages with HTML tags, researchers have also explored annotating webpages using ARIA and Javascript [9, 17, 42]. For example, Brown et al. [17] present a technique that injects JavaScript for monitoring changes to the webpage DOM via a custom ARIA live region. A more recent approach by Ayedin et al. [9] on the other hand, adopt a machine-learning based annotation approach, where they use pre-trained visual saliency deep networks to determine the key segments in a webpage, and then automatically add ARIA role attributes to the root nodes of the corresponding segment subtrees in the DOM, so that screen-reader users can quickly access these segments using the special screen-reader keyboard shortcut.

A common aspect of all aforementioned annotation techniques is that they propose a ‘one-size-fits-all’ solutions that rely on either pre-specified ontologies or pre-trained heuristic models, and therefore these techniques cannot accommodate the fine-grained website-specific personal interaction needs of screen-reader users. sTag fills this gap by enabling users to easily create and maintain their own personalized TOCs for faster access to desired segments.

2.3. Automation Techniques and Assistants

Automating repetitive web tasks have been previously explored to reduce blind users’ manual interaction effort during web browsing [11, 13, 28, 29, 33, 36]. For instance, Coscripter [28] lets users create scripts or macros that are shared between users via a central repository. To automate a task, a user simply needs to execute or playback the corresponding script that contains the task-specific sequence of actions. Instead of manually handcrafting scripts as is done by some automation approaches [13, 33], a few research works have also explored automatic generation of scripts from user demonstrations of the tasks [11, 29, 36]. Although web automation indeed reduces the manual effort of screen-reader users, they are currently limited to supporting a few frequently-performed repetitive tasks; they do not provide any support in ad-hoc web-browsing scenarios involving arbitrary websites.

Researchers have also proposed accessibility assistants for assisting blind users to efficiently interact with webpages using either natural language or special input gestures [8, 12, 24, 34, 39]. For instance, Billah et al. [12] present Speed-Dial, an interactive system that enables blind users to hierarchical navigate the webpage segments using a special Dial input device that support a small easy-to-perform set of rotate and press gestures. The speech assistant by Gadde et al. [24] on the other hand, lets blind users issue queries for obtaining a quick overview of the current webpage, as well as give navigational commands to shift screen-reader focus to a specific segment of interest. Ashok et. al. [8] further expanded the range of supported spoken commands and even facilitated dialog interaction with webpage content to complete simple tasks. While these natural-language assistants [8, 24] are highly usable from the interface perspective, they suffer from several limitations, notably the speech recognition accuracy (especially in noisy environments) [1] and the blind users’ social concerns (e.g., drawing undesired attention from others) and privacy [3].

3. APPROACH

The architecture of sTag browser extension is shown in Figure 2. As shown in Figure 2, sTag lets blind screen-reader users create their own ‘Table of Contents’ (TOC) for any arbitrary webpage. Specifically, the users can ‘tag’ desired segments (e.g., search results, result items, filter options, forms, menus, etc.) via different methods (Table 1), and sTag will automatically generate the corresponding skip-navigation links (or skip-links for convenience) for these segments in the custom TOC for that webpage. To identify the semantically-meaningful segments on a webpage, sTag leverages existent data extraction algorithms. The users can instantly access the generated custom TOC from anywhere in the webpage using different modalities as shown in Table 1, and then select a desired skip-link to directly shift focus to the corresponding desired segment on the webpage.

Figure 2:

Figure 2:

sTag architectural workflow.

Table 1:

Interacting with sTag interface via multiple input methods. Note that for keyboard interface, the user has to first press the ‘pass-through’ hotkey, e.g., Insert + F3 in Jaws, to avoid mix-up with screen-reader shortcuts. Note that Left-Click and One Finger Tap have different functions depending on whether the user is browsing the page or interacting with the TOC

Accessing sTag Interface
Intent Keyboard Mouse Touchpad
Add new skip-link for segment S Left Click One finger Tap
Open/Close TOC T Middle Click Two Finger Tap
Navigate links in TOC Arrow Keys Scroll Left/Right One Finger Swipes
Select a link in TOC Enter Left Click One finger Tap
Delete a link X Right Click Two finger Left-to-Right Swipe
Move a link up Shift+ Up Hold Left Button + Scroll Up One finger Up Swipe
Move a link down Shift + Down Hold Left Button + Scroll Down One finger Down Swipe

As also shown in Figure 2, sTag reuses the TOCs previously generated by a user to pre-populate or bootstrap custom TOCs for similar but previously unvisited websites, thereby reducing the user’s manual effort. For example, the custom TOC created for an ‘Expedia’ webpage is reused for bootstrapping TOCs for similar webpages in other travel websites such as ‘Travelocity’, ‘Priceline’, ‘Hotels’, etc. However, sTag validates each skip-link of a previously generated TOC to check if they are applicable for the current webpage, i.e., an equivalent segment for the skip-link also exists in the current webpage, and then adds the skip-link to the TOC of current page only if the validation is a success. If more than one TOCs are available (e.g., Expedia and Travelocity) for bootstrapping a TOC, sTag simply adds their union to the TOC of the current similar webpage (e.g., Priceline), after validating each skip-navigation link in the union.

3.1. Detecting Semantically Meaningful Segments in Webpages

As explained earlier in the section 2, plenty of techniques currently exist to identify different types of segments from webpages [5, 8, 19, 2123, 31, 37, 43, 44]. As there exist a myriad of segment types including custom defined widgets on the web, it is impractical to identify every single type of segment on any arbitrary webpage. Therefore, sTag only focuses on identifying a fixed set of most commonly found segments (e.g., data records, search form, login form, etc.) that are generic across websites, and also tend to be the key segments on the webpages. Specifically, sTag applies the aforementioned state-of-the-art identification techniques to extract the following key segments – data records such as lists of products, emails in inbox, search results [22]; article content [35]; webpage menus [4]; filter options for data records [37]; search form and login form [8]; account settings [37]; sort options for data records [8]; multi-page links such as those for articles, search results, etc. [21]; discussion forums such as comments section of articles [37]; and sidebar panels [37].

3.2. sTag Interface

Figure 1 depicts the TOC interface of sTag. As shown in the figure, the tagged segments, i.e., the skip-links are arranged in the form of a navigable linear list. When a user tags a segment on the webpage, sTag by default appends the corresponding skip-link to the end (i.e., bottom) of the TOC list. A user can interact with the sTag’s TOC interface via multiple input modalities as indicated in Table 1. While navigating the webpage content, a user can use any one of the actions listed in Table 1 to add a skip-link for the currently focused segment in the sTag’s TOC. Note however that, to tag a segment, the user’s screen-reader focus has to be on the one of the elements in the DOM sub-tree belonging to that segment. The custom TOC can be instantly accessed from anywhere on the webpage using the input methods specified in Table 1. As shown in Table 1, sTag also enables users to customize the entries or skip-links in the TOC. Specifically, the users can reorder the skip-links in the TOC based on their preferences, and also delete skip-links from the TOC.

To define custom keyboard shortcuts, we used the browser’s in-built support in the browser settings menu. For reprogramming mouse and touchpad behavior, we used a publicly available Mouse-Hook libraries3 that lets developers capture mouse/touchpad events, and define custom event handlers.

3.3. TOC Reuse Across Similar Websites

As mentioned before, to significantly reduce user’s manual effort, sTag reuses all previously created TOCs to the best extent possible for automatically generating or bootstrapping TOCs for newly visited webpages. The main idea behind this reuse is that similar webpages have similar semantic layout, i.e., similar set of segments in webpages, even if their underlying HTML DOM implementations are different. Therefore, since sTag stores only abstract segment-level skip-links in its TOCs, they can be referenced for automatically generating skip-links for the TOCs of similar webpages. For example, although two different shopping websites typically vary in their look and feel (i.e., HTML markup), they essentially have the same set of segments in the webpages containing data records. Therefore, a skip-link “skip to results” in the TOC for one website can be directly translated into an equivalent “skip to results” in the TOC for the other website, without requiring the user to manually tag the ‘results’ segment for the other website.

Reusing and translating the skip-links from one webpage A (e.g., Expedia) to another similar webpage B (e.g., Travelocity) requires figuring out a one-to-one mapping between the corresponding segments of A and B. For most segment types such as login form, search form, list of data records, etc., this mapping is straightforward as there are only one instance or segment of these types in a webpage. However, for some segment types such as filter options, menus, etc., there may be many segments belonging to that type, and therefore sTag employs an additional text-similarity based scoring method to derive matches between such segments belonging to the two websites. Specifically, to determine text-similarity score between any two segments, sTag computes cosine similarity metric, after encoding the texts of each segment using the well-known Word2Vec word-embedding technique [32]. The exact procedure of matching segments between two webpages and then automatically generating TOCs, is detailed in Algorithm 1.

Algorithm 1:

Automatic TOC generation

graphic file with name nihms-1777414-t0004.jpg

Note that Algorithm 1 is also used by sTag to match segments between two different versions of the same webpage, especially when the content of the webpage is dependent on user’s search query. For example, the filter-option groups shown on a shopping webpage are dependent on the type of product (e.g., electronics, clothes, furniture, etc.) being searched, and therefore, sTag adapts the custom TOC for such a webpage based on the user’s search query. On the other hand, if more than one previously created TOCs are available, sTag takes their union into consideration while automatically generating the TOC for the current page.

As accuracy of segment identification is outside our scope of control, we evaluated Algorithm 1 in terms of segment-matching accuracy. For this purpose, we built a custom test dataset comprising 50 pairs of similar webpages, where the segments in each pair were manually matched by a human annotator. For each pair, we executed Algorithm 1, and the precision and recall values were computed. Table 2 presents the average results obtained for different values of the matching threshold τ (see Algorithm 1). The best average results (P = 0.966, R = 0.943) were obtained for τ = [0.69, 0.72].

Table 2:

Algorithm 1 evaluation results

Average Segment-Matching Accuracy
r = 0.6 τ = 0.7 τ = 0.8 τ = 0.9
P = 0.77 P = 0.96 P = 0.97 P = 0.99
R = 1.0 R = 0.94 R = 0.85 R = 0.72

4. EVALUATION

4.1. Participants

Table 3 presents the demographics of the 15 blind participants who evaluated sTag. We recruited these participants through local mailing lists and word-of-mouth. The gender representation was almost equal (7 female, 8 male), and the participants varied in age between 25 and 64 years (μ = 43.86, σ = 12.63). All participants did not report any motor impairments that affected their ability to use screen-reader shortcuts. Proficiency with web screen-reading and Chrome web browser constituted our inclusion criteria. All participants regularly accessed a variety of websites such as Shopping, Searching, etc.

Table 3:

Participant demographics for the user study. All values were self reported by the participants.

ID Age/Gender Age of Vision Loss Preferred Screen Reader Hours Per day Computer Type Mouse Owned
P1 37/M Since birth JAWS 5–6 Laptop No
P2 26/M Age 3 NVDA 3–4 Laptop Yes
P3 52/F Age 5 JAWS 2–3 Laptop No
P4 58/F Since birth JAWS 2–3 Desktop Yes
P5 45/M Age 6 NVDA 3–4 Desktop Yes
P6 38/F Cannot remember JAWS 4–5 Laptop No
P7 54/F Cannot remember JAWS 1–2 Laptop No
P8 28/M Age 2 NVDA 3–4 Desktop Yes
P9 56/F Since birth System Access 1–2 Laptop Yes
P10 25/M Since birth JAWS 2–3 Desktop Yes
P11 62/F Age 12 JAWS 5–6 Laptop No
P12 40/M Since birth JAWS 3–4 Laptop No
P13 64/F Since birth JAWS 1–2 Desktop Yes
P14 34/M Cannot remember System Access 3–4 Desktop Yes
P15 39/M Since birth JAWS 1–2 Laptop No

4.2. Apparatus

We conducted the study remotely, where the participants used their own computers and preferred screen-readers (see Table 3). All participants had Google Chrome web browser installed on their computers. We emailed the sTag extension (as a Google Drive link) to the participants just before the study, and then assisted them in downloading and installing the extension via Zoom or Skype conferencing software. Only 3 participants (P2, P5, and P7) required additional assistance from either their family members or friends to install sTag.

4.3. Design

The study comprised a within-subject experimental setup, where the participants performed the following representative web tasks.

  • T1: Buy a product on a shopping website.

  • T2: Book a hotel on a travel website.

The participants performed each of these tasks under the following three study conditions:

  • Screen Reader: The participants used only their preferred screen reader to complete the study tasks.

  • SaIL: The participants used their preferred screen reader to complete the study tasks. However, the study websites were enhanced with a saliency-based state-of-the-art annotation technique [9]. This technique employs a saliency model to determine key segments on the page, and then automatically adds WAI-ARIA landmarks to these segments, so that users can use WAI-ARIA related shortcuts to access these segments.

  • sTag: The participants used sTag along with their preferred screen reader to complete the tasks. Also, the TOC was bootstrapped in this condition using participant data collected during practice session

For convenience, both SaIL and sTag were included in the single extension that was emailed to the participants. A special shortcut was provided to turn ‘ON/OFF’ each condition. Specifically, if the SalL condition was turned ‘ON’, the sTag condition was automatically turned ‘OFF, and vice versa.

To avoid learning effect, different websites were used in each condition for both T1 and T2. For T1, we selected the Walmart, eBay, and CostCo websites; for T2, we chose the Travelocity, Hotels, and Orbitz websites. These choices ensured that the tasks performed in all conditions were comparable for both T1 and T2. The assignment of websites to conditions as well as the ordering of tasks and conditions was counterbalanced for each participant using the Latin Square method [15]. For the practice session, we selected the websites Amazon (for shopping) and Expedia (for travel), so that the data can be leveraged to bootstrap TOCs during the actual study.

4.4. Procedure

The experimenter first assisted the participant in setting up the sTag extension. Next, a practice session (20 minutes) was conducted where the participants familiarized themselves with the study conditions; this also generated data for bootstrapping TOCs for study tasks. The participants were then asked to complete all the tasks under different study conditions in the predetermined counterbalanced order. For each task, the experimenter allowed a maximum of 15 minutes for the participant to complete the task. After the tasks, the experimenter administered subjective questionnaires (System Usability Scale (SUS) [16] for measuring usability and NASA Task Load Index (NASA-TLX) [25] to measure perceived user effort) and also collected suggestions and feature requests from the participant. Throughout the study, the screen-sharing and recording features were turned on so as to capture all user-interaction activities for subsequent post-study data analysis.

4.5. Results

4.5.1. Task Completion Times.

Figure 3 presents the task completion time statistics for all the study conditions. For Task T1 (see Figure 3a), the average task time (in seconds) with sTag was 359.66 (med = 379, min = 241, max =458), which was much lesser than that with just screen reader (μ = 727.06, med = 758, min = 585, max = 876) as well as that with SaIL (μ = 541.93, med = 573, min = 368, max = 667). The same trend was also observed for Task T2 (see Figure 3c), where the average task time with sTag was 419.8 (med = 404, min = 308, max = 544), which was much lesser than that with just screen reader (μ = 759.46, med = 745, min = 615, max = 876) and that with SaIL (μ = 597.73, med = 624, min = 420, max = 720). All these differences in completion times between the three study conditions were statistically significant (see Table 4).

Figure 3:

Figure 3:

Evaluation results from the user study for two tasks (T1 and T2).

Table 4:

Kruskal-Wallis test for statistical significance between study conditions.

Task Completion Time Number of User Actions
T1 H = 33.28, df = 2, p < 0.0001 H = 33.54, df = 2, p < 0.0001
T2 H = 33.44, df = 2, p < 0.0001 H = 31.86, df = 2, p < 0.0001

In the screen reader condition, the participants spent significant amount of time sequentially navigating back-and-forth between different segments (e.g., data records, filters, sort options, etc.) in both tasks T1 and T2, due to the high density of the content on the study webpages. Also, nearly half of the participants (7) on at least one occasion lost track of their current location in a webpage, and therefore had to navigate back to the top of the page using the dedicated screen-reader shortcut (e.g., CTRL+HOME in JAWS), and start over again. Three participants (P2, P5, and P13) even tried using the ‘Find’, i.e., CTRL+F shortcut, but were mostly unsuccessful in their searched as their queries did not exactly match the text of the webpage segments. This finding is in accordance with results of a recent study [41], that found that screen-reader users rarely use the CTRL+F keyboard shortcut.

While the automatically injected landmarks in the SaIL condition indeed significantly reduced the task completion times, they were still much higher than those in case of the sTag condition. Analysis of study data revealed that this difference was due to the inadequate coverage of the SaIL approach, i.e., there were many segments on the page that the participants wanted to access while doing the tasks, but these segments were not covered by the SaIL WAI-ARIA annotations. Therefore, the participants had to manually navigate the page content sequentially to access these uncovered segments, which in turn significantly contributed to the increase in task completion times.

In the sTag condition, we observed that the bootstrapped TOCs contained most of the segments that the participants wanted to access while doing the tasks, and therefore they could complete the tasks significantly faster than in the other study conditions. Even when a segment was not present in the TOC, the participants manually navigated to that segment only once and then tagged it so that they could use the TOC later to quickly access that segment again if needed; for instance, while navigating back-and-forth between the hotel data records, filter options, multi-page links, and search form segments during task T2.

4.5.2. Number of User Actions.

Figure 3 also presents the statistics regarding number of user-actions for all the study conditions. For Task T1 (see Figure 3b), the average number of actions with sTag was 171.33 (med = 161, min = 104, max = 252), which was much lesser than that with just screen reader (μ = 468.66, med =489, min = 301, max = 603) as well as that with SaIL (μ = 304.13, med = 323, min = 202, max = 402). Similarly, for Task T2 (see Figure 3d), the average number of actions with sTag was 233.33 (med = 217, min = 151, max = 341), which was considerably lesser than that with just screen reader (μ = 517.4, med = 534, min = 355, max = 653) and that with SaIL (μ = 368.73, med = 386, min = 257, max = 459). Like task completion times, all these differences in number of input actions between the three study conditions were statistically significant (see Table 4).

The number of user actions in the screen-reader condition was the highest since the participants pressed a multitude of shortcuts to manually navigate between the segments on the webpages. This manual effort was relatively lesser in case of the SaIL condition as some of the desired segments could be accessed quickly using the special landmark screen-reader shortcut. In the sTag condition however, the manual screen-reader navigation effort was the least as most of the desired segments were present in the TOC. Even those segments that were not initially present in the TOC, were only accessed and tagged by the participants once, and all subsequent accesses to these segments were done via the TOC.

4.5.3. Subjective Evaluation.

System Usability Scale (SUS).

For the standard SUS questionnaire to measure usability [16], the participants rated positive and negative statements about each study condition on a Likert scale from 1 - strongly disagree to 5 - strongly agree. Overall, we found a significant impact of the study conditions on the SUS scores (one-way ANOVA, F = 15.435, p < 0.0001). Specifically, the SUS scores for the sTag (μ = 83.5, σ = 8.05) condition was significantly higher than those for the SaIL (μ = 71.66, σ = 17.5) and the screen reader (μ = 56.83, μ = 10.7) conditions.

Analysis of the individual responses to the SUS Likert items revealed that for the sTag condition, most participants generally gave a positive feedback to all statements except the last one (i.e., ‘I needed to learn a lot of things before I could get going with this system’). This was especially the case for participants who preferred using keyboard to interact with sTag (P5, P8, P11, P12, and P15) instead of pointing device such as a mouse or touchpad. In the post-study interview, these participants stated that remembering extra keyboard shortcuts in addition to standard screen-reader shortcuts, requires quite a bit of learning effort in the beginning stages of using the system. Even other participants gave mostly neutral responses to the last SUS statement, and they attributed their ratings to the unfamiliarity with pointing devices before the study.

NASA Task Load Index (NASA-TLX).

The NASA-TLX questionnaire [25] measures perceived task workload as a value between 0 and 100, with lower values indicating better results. Overall, we found a significant impact of the study conditions on the NASA TLX scores (one-way ANOVA, F = 47.55, p < 0.0001). Specifically, the TLX scores for the sTag (μ = 41.46, σ = 8.24) condition was significantly lower than those for tire SalL (μ = 58.42, σ = 8.84) and the screen reader (μ = 72.22, σ = 7.95) conditions.

A deeper analysis of TLX data revealed that among the six sub-scales (i.e, Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration) of the two-part TLX questionnaire (i.e., load rating between 0–100 and individual weighting with pairwise comparisons), ’Effort’ and ‘Temporal Demand’ most contributed to the difference in scores. Specifically, the average user ratings for these sub-scales for the screen reader condition were 1.88 times and 2.57 times respectively of that observed in the sTag condition, and average ratings for the SaIL condition were 1.76 times and 1.67 times of that seen in the sTag condition. We noticed the same trend in the second part of TLX questionnaire, where ’Effort’ and ‘Temporal Demand’ sub-scales were picked more often than the others in the screen reader and SaIL conditions, whereas the selections were more uniform for the sTag condition.

4.5.4. Qualitative Feedback and User Behavior.

All participants stated that they liked having multiple choices (i.e., keyboard, mouse, and touchpad) for interacting with sTag. However, in the study, most participants (13) stuck to using only one input modality of their choice to interact with sTag. Five participants (P5, P8, P11, P12, and P15) only used the keyboard to access the features of sTag. These participants expressed that they preferred keyboard because they did not want to move their hands between multiple input devices while browsing the web. The remaining 8 participants explained that preferred using a pointing device to interact with sTag because they felt this arrangement was less confusing as well as fully resistant to unintentional mistakes such as shortcut mix-ups. Furthermore, these 8 participants mentioned that they did not like pressing the ‘pass-through’ screen-reader shortcut every time before using a sTag shortcut. Only two participants (P2 and P4) used more than one input modality to interact with the sTag interface. These participants mentioned that they felt comfortable using different modalities for different sTag activities such as tagging, navigating TOC, setting menu options, etc. Six participants (P1, P4, P5, P7, P11, and P15) desired a hierarchical arrangement of skip-links in the TOC rather than the current linear list arrangement. They claimed that a hierarchical TOC would help them assign priorities (i.e., levels) to the skip-links, thereby reducing the time and effort required to access the desired skip-links.

5. DISCUSSION

The user study revealed not only revealed the overall benefits of sTag, but it also illuminated the limitations of sTag; some of these are discussed next.

5.1. Limitations

An obvious limitation of sTag is the ‘cold start’ problem, where a user has to manually create the custom TOCs from scratch for a newly visited website in case no previously generated TOCs exist for similar kind of websites. While this problem is unlikely have much impact in the long run as sTag collects more and more data over time, initial investment of time and effort may still be unappealing and burdensome to some screen-reader users. One way to address this issue is to further generalize the Automatic TOC generation algorithm to also consider previously-generated TOCs for other types of webpages while bootstrapping the TOC for the current page, instead of just the ones for similar webpages. The idea underlying this generalization is that some types of segments (e.g., data records, filter groups, search form, etc.) are common across multiple types of websites (e.g., shopping, travel, job search, etc.), and therefore sTag can match these segments at the very least, and subsequently auto-generate corresponding skip-links in the TOC for the current webpage. Another approach to address this cold start issue is by crowd-sharing the TOCs, which we discuss later in the section.

Another limitation is that when the users rely on keyboard to interact with sTag, they always need to press the ‘pass-through’ screen-reader shortcut (e.g., INSERT+F3 in JAWS) first before using any of the sTag related shortcuts specified in Table 1. As seen in the study, some of the participants found this to be cumbersome and sometimes easy-to-forget, thereby leading to unintentional input mistakes. For open-source screen readers such as NVDA, this problem can be potentially overcome by extending its source code to support sTag shortcuts. Another approach will be to inject new HTML content containing sTag interface at the beginning of a webpage, so that the users can rely on the familiar ‘go-to-beginning’ screen-reader shortcut (e.g., ‘CTRL+HOME’ in JAWS) for instantly accessing the TOC. These are topics of future research.

5.2. Crowd-sharing the TOCs

Sharing TOCs between the users can further reduce the users’ manual effort in creating custom TOCs, as the shared TOCs can help auto-generate a few key desired skip-links. Sharing task knowledge has been explored before [11, 28]. For example, Coscripter [28], a seminal work in this regard, lets users create task-automation scripts and share them with others via a central repository. We therefore plan to extend sTag so as to enable anonymous and secure sharing of TOC data between several users. Using shared data to bootstrap TOCs for a given user also involves several technical challenges, most notably being able to find and use TOCs from a few subset of other users whose browsing preferences and behavior closely resemble that of the given user. Addressing these challenges are scope of future work.

6. CONCLUSION

This paper presented sTag, a browser extension that enables blind screen-reader users to efficiently and conveniently browse the web, by creating their own Table-Of-Contents (TOC) for any arbitrary webpage. These TOCs are instantly accessible from anywhere in the webpage via a variety of input methods. Furthermore, the created TOCs are also re-used by sTag to auto-generate TOCs for other similar webpages, thereby significantly reducing manual effort. A user study with 15 blind participants showcased the effectiveness of sTag in significantly improving web-browsing efficiency and usability, and also provided insights into how sTag can be improved

CCS CONCEPTS.

• Human-centered computing → Accessibility technologies; Empirical studies in accessibility;

ACKNOWLEDGMENTS

This work was supported by NSF Award 1805076 and NIH grant R01EY030085. We thank Hae-Na Lee and the anonymous reviewers for the insightful feedback that helped improve the paper.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

Footnotes

Contributor Information

Javedul Ferdous, Old Dominion University, Norfolk, Virginia.

Sami Uddin, Old Dominion University, Norfolk, Virginia.

Vikas Ashok, Old Dominion University, Norfolk, Virginia.

REFERENCES

  • [1].Abdolrahmani Ali, Kuber Ravi, and Stacy M Branham. 2018. " Siri Talks at You" An Empirical Investigation of Voice-Activated Personal Assistant (VAPA) Usage by Individuals Who Are Blind. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility. 249–258. [Google Scholar]
  • [2].NV Access. 2020. NV Access. https://www.nvaccess.org/.
  • [3].Ahmed Tousif, Shaffer Patrick, Connelly Kay, Crandall David, and Kapadia Apu. 2016. Addressing physical safety, security, and privacy for people with visual impairments. In Twelfth Symposium on Usable Privacy and Security ({SOUPS} 2016). 341–354.
  • [4].Alarte Julian, Insa David, and Silva Josep. 2017. Webpage menu detection based on DOM. In International Conference on Current Trends in Theory and Practice of Informatics. Springer, 411–422. [Google Scholar]
  • [5].Álvarez Manuel, Pan Alberto, Raposo Juan, Bellas Fernando, and Cacheda Fidel. 2010. Finding and extracting data records from web pages. Journal of Signal Processing Systems 59, 1 (2010), 123–137. [Google Scholar]
  • [6].Asakawa Chieko and Takagi Hironobu. 2000. Annotation-based transcoding for nonvisual web access. In Proceedings of the fourth international ACM conference on Assistive technologies. 172–179. [Google Scholar]
  • [7].Asakawa Chieko, Takagi Hironobu, and Fukuda Kentarou. 2019. Transcoding. In Web Accessibility. Springer, 569–602. [Google Scholar]
  • [8].Ashok Vikas, Puzis Yury, Borodin Yevgen, and Ramakrishnan IV. 2017. Web Screen Reading Automation Assistance Using Semantic Abstraction. In Proceedings of the 22nd International Conference on Intelligent User Interfaces (IUI ’17). Association for Computing Machinery, New York, NY, USA, 407–418. 10.1145/3025171.3025229 [DOI] [Google Scholar]
  • [9].Aydin Ali Selman, Feiz Shirin, Ashok Vikas, and Ramakrishnan IV. 2020. SaIL: Saliency-Driven Injection of ARIA Landmarks. In Proceedings of the 25th International Conference on Intelligent User Interfaces (IUI ’20). Association for Computing Machinery, New York, NY, USA, 111–115. 10.1145/3377325.3377540 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Bechhofer Sean, Harper Simon, and Lunn Darren. 2006. Sadie: Semantic annotation for accessibility. In International Semantic Web Conference. Springer, 101–115. [Google Scholar]
  • [11].Bigham Jeffrey P., Lau Tessa, and Nichols Jeffrey. 2009. Trailblazer: enabling blind users to blaze trails through the web. In Proceedings of the 13th International Conference on Intelligent User Interfaces. ACM. 10.1145/1502650.1502677 [DOI] [Google Scholar]
  • [12].Billah Syed Masum, Ashok Vikas, Porter Donald E., and Ramakrishnan IV. 2017. Speed-Dial: A Surrogate Mouse for Non-Visual Web Browsing. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ’17). Association for Computing Machinery, New York, NY, USA, 110–119. 10.1145/3132525.3132531 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Bolin Michael, Webber Matthew, Rha Philip, Wilson Tom, and Miller Robert C. 2005. Automation and customization of rendered web pages. In Proceedings of the 18th annual ACM symposium on User interface software and technology. 163–172. [Google Scholar]
  • [14].Borodin Yevgen, Jeffrey P Bigham Glenn Dausch, and Ramakrishnan IV. 2010. More than meets the eye: a survey of screen-reader browsing strategies. In Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A). 1–10. [Google Scholar]
  • [15].Bradley James V. 1958. Complete counterbalancing of immediate sequential effects in a Latin square design. J. Amer. Statist. Assoc. 53, 282 (1958), 525–528. [Google Scholar]
  • [16].Brooke John. 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry 189 (1996), 194. [Google Scholar]
  • [17].Brown Andy and Harper Simon. 2013. Dynamic Injection of WAI-ARIA into Web Content. In Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility (W4A ’13). Association for Computing Machinery, New York, NY, USA, Article 14, 4 pages. 10.1145/2461121.2461141 [DOI] [Google Scholar]
  • [18].Cai Deng, Yu Shipeng, Wen Ji-Rong, and Ma Wei-Ying. 2004. Block-Based Web Search. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04). Association for Computing Machinery, New York, NY, USA, 456–463. 10.1145/1008992.1009070 [DOI] [Google Scholar]
  • [19].Cai Deng, Yu Shipeng, Wen Ji-Rong, and Ma Wei-Ying. 2004. VIPS: A vision based page segmentation algorithm. Report. Microsoft technical report.
  • [20].Cai Zehuan, Liu Jin, Xu Lamei, Yin Chunyong, and Wang Jin. 2017. A Vision Recognition Based Method for Web Data Extraction. Advanced Science and Technology Letters 143 (2017), 193–198. [Google Scholar]
  • [21].Duan Lei, Li Fan, Vadrevu Srinivas, Velipasaoglu Emre, Hajela Swapnil, and Chakrabarti Deepayan. 2014. Automatic classification of segmented portions of web pages. US Patent 8,849,7 25.
  • [22].Fang Yixiang, Xie Xiaoqin, Zhang Xiaofeng, Cheng Reynold, and Zhang Zhiqiang. 2018. STEM: a suffix tree-based method for web data records extraction. Knowledge and Information Systems 55, 2 (2018), 305–331. [Google Scholar]
  • [23].Feng Junlan and Hollister Barbara B. 2014. System and method of identifying web page semantic structures. US Patent 8,825,628. [Google Scholar]
  • [24].Gadde Prathik and Bolchini Davide. 2014. From screen reading to aural glancing: towards instant access to key page sections. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility. 67–74. [Google Scholar]
  • [25].Hart Sandra G and Stave land Lowell E. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Vol. 52. Elsevier, 139–183. [Google Scholar]
  • [26].Apple Inc. 2020. Vision Accessibility - Mac - Apple, https://www.apple.com/accessibility/mac/vision/.
  • [27].Lazar Jonathan, Allen Aaron, Kleinman Jason, and Malarkey Chris. 2007. What Frustrates Screen Reader Users on the Web: A Study of 100 Blind Users. International Journal of Human-Computer Interaction 22, 3 (2007), 247–269. arXiv: 10.1080/10447310709336964 [DOI] [Google Scholar]
  • [28].Leshed Gilly, Haber Eben M, Matthews Tara, and Lau Tessa. 2008. CoScripter: automating & sharing how-to knowledge in the enterprise. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1719–1728. [Google Scholar]
  • [29].Li Ian, Nichols Jeffrey, Lau Tessa, Drews Clemens, and Cypher Allen. 2010. Here’s what i did: sharing and reusing web activity with ActionShot. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 723–732. [Google Scholar]
  • [30].Liu Wei, Meng Xiaofeng, and Meng Weiyi.2009. Vide: A vision-based approach for deep web data extraction. IEEE Transactions on Knowledge and Data Engineering 22, 3 (2009), 447–460. [Google Scholar]
  • [31].Melnyk Valentyn, Ashok Vikas, Puzis Yury, Soviak Andrii, Borodin Yevgen, and Ramakrishnan IV. 2014. Widget classification with applications to web accessibility. In International Conference on Web Engineering. Springer, 341–358. [Google Scholar]
  • [32].Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg S, and Dean Jeff. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119. [Google Scholar]
  • [33].Montoto Paula, Pan Alberto, Raposo Juan, Bellas Fernando, and Lopez Javier. 2009. Automating navigation sequences in AJAX websites. In International Conference on Web Engineering. Springer, 166–180. [Google Scholar]
  • [34].Pradhan Alisha, Mehta Kanika, and Findlater Leah. 2018. " Accessibility Came by Accident" Use of Voice-Controlled Intelligent Personal Assistants by People with Disabilities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13. [Google Scholar]
  • [35].Prasad Jyotika and Paepcke Andreas. 2008. Coreex: content extraction from online news articles. In Proceedings of the 17th ACM conference on Information and knowledge management. 1391–1392. [Google Scholar]
  • [36].Puzis Yury, Borodin Yevgen, Puzis Rami, and Ramakrishnan IV. 2013. Predictive web automation assistant for people with vision impairments. In Proceedings of the 22nd international conference on World Wide Web. 1031–1040. [Google Scholar]
  • [37].Rajan Suju, Gaffney Scott J, and Punera Kunal. 2017. Annotating HTML segments with functional labels. US Patent 9,594,730. [Google Scholar]
  • [38].Freedom Scientific. 2020. JAWS ® - Freedom Scientific. http://www.freedomscientific.com/products/software/jaws/.
  • [39].Stangl Abigale J, Kothari Esha, Jain Suyog D, Yeh Tom, Grauman Kristen, and Gurari Danna. 2018. Browsewithme: An online clothes shopping assistant for people with visual impairments. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility. 107–118. [Google Scholar]
  • [40].van der Meer Jeroen, Boon Ferry, Hogenboom Frederik, Frasincar Flavius, and Kaymak Uzay. 2011. A framework for automatic annotation of web pages using the Google rich snippets vocabulary. In Proceedings of the 2011 ACM Symposium on Applied Computing. 765–772. [Google Scholar]
  • [41].WebAIM. 2020. Screen Reader User Survey #8 Results - WebAIM. https://webaim.org/projects/screenreadersurvey8/.
  • [42].Wu Shaomei, Wieland Jeffrey, Farivar Omid, and Schiller Julie. 2017. Automatic alt-text: Computer-generated image descriptions for blind users on a social network service. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1180–1192. [Google Scholar]
  • [43].Zhai Yanhong and Liu Bing. 2005. Web Data Extraction Based on Partial Tree Alignment. In Proceedings of the 14th International Conference on World Wide Web (WWW ’05). Association for Computing Machinery, New York, NY, USA, 76–85. 10.1145/1060745.1060761 [DOI] [Google Scholar]
  • [44].Zhu Jun, Nie Zaiqing, Wen Ji-Rong, Zhang Bo, and Ma Wei-Ying. 2006. Simultaneous Record Detection and Attribute Labeling in Web Data Extraction. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’06). Association for Computing Machinery, New York, NY, USA, 494–503. 10.1145/l150402.1150457 [DOI] [Google Scholar]

RESOURCES