Abstract
Interacting with long web documents such as wiktionaries, manuals, tutorials, blogs, novels, etc., is easy for sighted users, as they can leverage convenient pointing devices such as a mouse/touchpad to quickly access the desired content either via scrolling with visual scanning or clicking hyperlinks in the available Table of Contents (TOC). Blind users on the other hand are unable to use these pointing devices, and therefore can only rely on keyboard-based screen reader assistive technology that lets them serially navigate and listen to the page content using keyboard shortcuts. As a consequence, interacting with long web documents with just screen readers, is often an arduous and tedious experience for the blind users.
To bridge the usability divide between how sighted and blind users interact with web documents, in this paper, we present iTOC, a browser extension that automatically identifies and extracts TOC hyperlinks from the web documents, and then facilitates on-demand instant screen-reader access to the TOC from anywhere in the website. This way, blind users need not manually search for the desired content by moving the screen-reader focus sequentially all over the webpage; instead they can simply access the TOC from anywhere using iTOC, and then select the desired hyperlink which will automatically move the focus to the corresponding content in the document. A user study with 15 blind participants showed that with iTOC, both the access time and user effort (number of user input actions) were significantly lowered by as much as 42.73% and 57.9%, respectively, compared to that with another state-of-the-art solution for improving web usability.
Keywords: Accessibility, assistive technology, screen reading, visual impairment, web browsing
I. Introduction
The Web has permeated almost every aspect of modern society, and it has become the primary source for acquiring knowledge and seeking information. Web documents such as wiktionaries, manuals, tutorials, blogs, scientific literature, etc., are good sources for obtaining information and learning, and play an important role in assisting users carry out their day-to-day activities [1], [2]. To interact with these documents, people who are blind use a screen reader (e.g., JAWS [3], VoiceOver [4], NVDA [5]), an assistive technology that narrates screen content and lets users sequentially navigate the page content using special keyboard shortcuts. However, given the large size of these documents coupled with a content layout that is more favorable for visual consumption via pointing devices such as mouse and touchpad, blind users find it tedious and cumbersome to access the desired content on the page using just linear screen-reader navigation involving a multitude of shortcut presses.
While document authors do provide a Table of Contents (TOC) with hyperlinks to improve navigation in long documents, it is designed primarily for visual access, and therefore blind screen-reader users are unable to exploit it to the same extent as their sighted peers. For instance, in the web document1 shown in Figure 1a, screen-reader users would need to press a multitude of shortcuts to access the TOC from the beginning of the page as well as from an arbitrary section in the page, based on their scanning navigation models [6]. In contrast, sighted users can instantly locate the TOC when the page is loaded, or simply scroll to the top of the page from an arbitrary section and locate the TOC almost instantaneously with a visual scan; they can then select the desired section (i.e., hyperlink) easily with a pointing device such as mouse or touchpad. Therefore, it takes significantly more time and effort for blind users to navigate to the desired sections of the document, whereas sighted users can accomplish the same in a matter of few seconds.
Fig. 1.

Use case scenarios illustrating how a screen-reader user navigates a long web document with a screen reader and with iTOC. (a) With the screen reader, a blind user first navigates to Table of Contents (TOC) on the page using shortcuts. The user then selects the fifth section link in the TOC to shift focus to that section (step ①). Then after reading the content in the fifth section (step ②), to access another desired section (say tenth), the user has to either navigate back to TOC or navigate over the web document to the tenth section using shortcuts (step ③). (b) With iTOC, the user instantly accesses the TOC using the custom-defined ‘C’ shortcut after the page is loaded. The user then navigates the TOC using arrow keys, and selects the link to the fifth section by pressing the Enter key, to shift focus to the fifth section (step ①). After reading the content in the fifth section (step ②), to go to the tenth section, the user once again instantly accesses the TOC with the ‘C’ shortcut, and selects the corresponding link in the TOC (step ③).
Even the existing accessibility solutions [7]–[9] for improving navigational efficiency for blind users are not much helpful while interacting with long web documents, as they were mostly designed for generic web applications (e.g., e-commerce, news, social media, etc.) and as such do not incorporate the fine-grained requirements associated specifically with web documents such as wiktionaries, manuals, tutorials, blogs, etc. For example, instead of relying on the TOC that is explicitly specified by the document author in the page, the existing approaches rely on generic hand-coded ontologies or machine learning models to identify the important page sections, which are prone to inaccuracies.
To fill the usability gap that exists between how sighted and blind users interact with web documents, we present iTOC (stands for ‘instant TOC’, see Figure 1b), a browser extension that automatically detects and extracts the TOC from the Document Object Model (DOM) of webpage, and then makes it available on-demand to screen-reader users via its custom interface that can be accessed anytime with a special shortcut. Also, in the iTOC interface, a blind user can simply use the arrow keys to navigate over the TOC contents and press Enter to select the desired hyperlink to automatically move the screen-reader focus to the beginning of the corresponding section. This way, by automatically ‘pushing’ the TOC content on-demand, iTOC provides efficient and easy interaction with long web documents to blind screen-reader users.
We evaluated iTOC in a user study with 15 completely blind participants. The study revealed that with iTOC, the access time and effort (i.e., number of shortcut presses) on average, got significantly reduced by as much as 59.03% and 67.10%, respectively, compared to that with just a screen reader, and by as much as 42.73% and 57.9%, compared to that with a state-of-the-art accessibility solution [8].
II. Related Work
The proposed effort in this paper is closely relevant to the following existing research works: (a) identifying semantic structures in webpages, (b) annotating web documents for improving non-visual usability; and (c) web automation and assistants for blind users.
A. Identifying Semantic Structures in Webpages
Identifying important sections in the page and extracting their data have been previously explored by prior works [10]–[12]. Each of these existing approaches focuses on extract one particular type of section (e.g., data records, menu, dynamic widgets, main content, news articles, etc.) from the pages. To the best of our knowledge, there are no existing techniques for identifying TOCs in web pages. However, there are prior works that identify a TOC from PDF or image documents by either exploiting metadata information in the PDF [13], [14] or using computer vision techniques by treating the PDF as an image [15]. However, the scope of all these techniques is limited to certain specific type of documents such as books or published literary articles, which are vastly different in their composition and structure compared to web documents such as wikis. For instance, unlike PDF books and scanned images, web documents have plenty of additional content (menus, ads, hyperlinks, sidebars, widgets, etc.) in the form of ‘noise’ surrounding the TOC. Also, web documents have multiple other lists of hyperlinks that visually share a lot of similarities with the TOC, as therefore computer vision-based techniques designed for scanned documents are less likely to be effective for web documents.
B. Improving Non-Visual Usability with Annotations
There also exist a plethora of research works [7], [8], [16], [17], which annotate webpages with explicit visual and structural semantic information to improve their usability. For example, the annotation technique in [16] injects dummy HTML elements containing visual layout information into the page DOMs, so that blind users can access this information with screen readers. However, due to the overhead of human training, their approach was found to be less practical [17]. To tackle this issue, Bechhofer et al. [17] propose semantically annotating the CSS style sheets instead of main source file, so that they can be reused and customized by multiple other annotation techniques. Alternatively, researchers propose injecting ARIA and JavaScript instead of direct annotations to improve usability [7], [8]. For example, the approach in [7] injects JavaScript to monitor and report dynamic DOM changes via a custom ARIA live region. On the other hand, visual saliency network approach is used to identify the important sections of a webpage [8]. It automatically injects ARIA landmarks in the corresponding DOM subtrees, so as to make these sections navigable with ARIA-related screen-reader shortcuts.
In sum, most annotation techniques are primarily geared towards exposing semantics to screen-reader users, with little focus on improving the efficiency of screen-reader navigation in webpages. Also, they rely on pre-specified ontologies and pre-trained heuristic models that are generic in nature, and therefore cannot handle the fine-grained specific interaction needs associated with web documents such as wiktionaries, manuals, blogs, etc. Specifically, unlike iTOC, none of these approaches exploit the information regarding the content organization of the page as well as the content structure and semantics that are explicitly provided by the webpage authors; instead, they attempt to figure out this information on their own using generic heuristic methods that are prone to inaccuracies.
C. Web Automation and Assistants for Blind Users
To ease up the navigational burden for blind users, Web automation techniques [18]–[20] and accessibility assistants [9], [21], [22] have also been proposed. Automation techniques, as the name suggests, automate certain tasks that are frequently-performed by users. To do so, these techniques require creation and maintenance of task scripts or macros, which can be done either by handcrafting [18], [19] or user demonstration [20], [23], [24]. While automation indeed reduces interaction burden for screen-reader users, they are limited to automating a few repetitive tasks on some frequently-used websites; it is impractical to create scripts for each task on every website. Furthermore, users have to also invest considerable time creating and managing these automation scripts.
Accessibility assistants [9], [21], [22], on the other hand, support ad-hoc navigational assistance, by enabling screen-reader users to leverage other input methods such as speech and haptics. For example, [9] proposes a semantics-driven content-navigation method that enables blind users to leverage Dial input device to hierarchically navigate different page sections and their constituents using simple rotate and press gestures. Gadde et al. [21], on the other hand, lets screen-reader users to rely on simple spoken utterances to get a quick overview of current webpage and then navigate to the section of interest. Although these approaches are highly usable from the interface perspective, they also face the same issue as aforementioned annotation approaches in that they rely on generic heuristic methods instead of explicitly provided TOC to identify key sections, and therefore are prone to errors.
III. iTOC Design
Figure 2 presents an architectural schematic illustrating the workflow of iTOC. Whenever a new webpage is loaded in the browser, the iTOC automatically identifies and extracts the TOC (if present) from the Document Object Model (DOM) of the current page, using the TOC extraction algorithm. Specifically, the algorithm extracts all the lists in the webpage and then classifies them as either TOC or non-TOC. The algorithm also performs disambiguation in case more than one list gets classifies as TOC. The extracted TOC hyperlinks for the page are then arranged in the same order in the iTOC interface (see Figure 1b).
Fig. 2.

An architectural schematic of iTOC.
The user can access the iTOC interface anytime and from anywhere in the website using the custom designated shortcut ‘C’, which is programmed to shift the screen-reader focus from the webpage content to the iTOC extension GUI containing the TOC2. The user can then use UP and DOWN arrow keys to navigate the hyperlinks in the TOC, and then press Enter key to select the desired link; the extension will then automatically load the corresponding same-page url in the browser so that the screen-reader focus automatically moves to the corresponding section on the webpage. Once the navigational action is completed, the iTOC interface is automatically closed. Note that depending on the screen reader, before pressing ‘C’, the user first needs to press the “pass-through” hotkey, e.g., Insert+3 hotkey in JAWS, in order to instruct the screen reader to let the browser interpret the next key press 3.
A. TOC Extraction Algorithm
1). Overview:
Algorithm 1 presents the iTOC’s TOC extraction technique. iTOC models the task of identifying the TOC in a webpage as a binary classification problem, where all the HTML lists in the webpage (i.e., elements of unordered/ordered HTML lists having <ul> and <ol> tags, respectively) are classified as either ‘TOC’ or ‘non-TOC’. Specifically, for each identified list on the page, the algorithm extracts a set of custom-defined features (see Table I) from the page DOM, and then feeds this feature vector to the machine learning model for classification. If many lists are identified as ‘TOC’ by the classifier, the algorithm selects the one with the highest confidence score (given by the classifier model) as the TOC for the page.
Algorithm 1:
TOC extraction algorithm
![]() |
TABLE I.
List of features for the TOC classifier.
| Feature | Description | Rationale |
|---|---|---|
| fHY P | Ratio of hyperlink text over total text | TOC mostly contains hyperlinks in general |
| fNUM | Fraction of links having numbers in the beginning of their texts | TOC items usually are numbered |
| fKEY | Number of keywords present (e.g., “contents”, “on this page”, “topics”) in the immediate heading preceding <ul> or <ol> | Some keywords frequently appear as the TOC title (We built a dictionary of such keywords) |
| fPRT | Fraction of links whose text content also appears elsewhere in the page | TOC links’ text contents usually also appear in other parts of the page |
| fPRT H | Fraction of links whose text content also appears elsewhere in the page as headings | TOC links’ text contents also appear as section headings |
| FNHY P | Fraction of hyperlinks that are redirects on the same page | Most TOC links are same page links as indicated by their ‘href’ attribute values |
2). Classifier:
For choosing a classifier, we first evaluated three well-known models: SVM, decision tree, and random forest on a custom dataset comprising 200 positive examples (‘TOC’ lists) and 200 negative examples (‘non-TOC’ lists). These lists were collected from a wide range of wiktionaries, manuals, blogs, tutorials, government websites, etc., and then the features (see Table I) were extracted from each of them to create the corresponding training example (x, y), where x is the feature vector, and y is the class (‘TOC’ or ‘non-TOC’) that is manually annotated. Table II presents the results of nested 5*4 cross-validation on the dataset for the different classifiers. We ensured that the number of positive and negative examples were equal (i.e., 40 each) in both the outer and inner folds of the nested cross validation. Also, parameter tuning was done in the inner loop of nested cross validation, and the corresponding optimal parameters were used to train and test in the outer loop. We used scikit-learn module [25] to learn the classifiers. As seen in Table II, the random forest classifier produced the best average F1 score, and therefore we chose this model for the TOC classification task in Algorithm 1.
TABLE II.
Performance of TOC classifier.
| Classifier | Precision | Recall | F1 Score |
|---|---|---|---|
| SVM | 0:819 | 0:750 | 0:779 |
| Decision tree | 0:859 | 0:808 | 0:832 |
| Random forest | 0:863 | 0:812 | 0:836 |
3). Algorithm Performance:
We evaluated the algorithm performance on a separate dataset comprising 100 webpages (not overlapping with the earlier dataset) containing TOCs. Like the dataset used to learn a classifier, these webpages were drawn from a variety of websites including wiktionaries, tutorials, blogs, articles, etc. The average number of lists (including TOCs) per webpage was 37.46. The average F1 score of the algorithm with just the classifier (not including the final arbitration step) over these 100 websites was 0.78. With the arbitration step included, the F1 score increased to 0.89, thereby indicating that the classifier correctly assigned higher confidence scores to the ‘TOC’ lists compared to the ‘non-TOC’ lists.
IV. Evaluation
A. Participants
We recruited 15 participants (8 female, 7 male) who were completely blind, via local mailing lists and word-of-mouth (see Table III). None of the participants had any motor impairments that affected their ability to press keyboard shortcuts. Our inclusion criteria was proficiency with web browsing and JAWS screen reader. All participants also stated that they frequently access web documents such as Wikipedia, FAQs, novels, etc., for their everyday activities.
TABLE III.
Participant demographics for the user study.
| ID | Age/Gender | Age of Vision Loss | Screen Reader | Freq. Web Documents |
|---|---|---|---|---|
| P1 | 31/F | Since birth | JAWS, VoiceOver | Daily |
| P2 | 36/F | Since birth | JAWS, NVDA | 3 days/week |
| P3 | 26/M | Age 6 | JAWS, NVDA | 2 days/week |
| P4 | 49/M | Since birth | JAWS, VoiceOver | Daily |
| P5 | 52/M | Age 2 | JAWS, NVDA | 2 days/week |
| P6 | 44/M | Since birth | JAWS, NVDA | 1 day/week |
| P7 | 56/F | Cannot remember | JAWS, NVDA, System Access | 1 day/week |
| P8 | 29/F | Age 5 | JAWS, NVDA | 3 days/week |
| P9 | 55/M | Since birth | JAWS, System Access | 2 days/week |
| P10 | 42/F | Age 8 | JAWS, Narrator | 5 days/week |
| P11 | 58/M | Age 2 | JAWS, NVDA | Daily |
| P12 | 39/M | Since birth | JAWS | 1 day/week |
| P13 | 51/F | Cannot remember | JAWS, VoiceOver, Narrator | 5 days/week |
| P14 | 33/F | Since birth | JAWS, System Access | 4 days/week |
| P15 | 38/F | Since birth | JAWS, VoiceOver | 5 days/week |
B. Apparatus
The study was remotely conducted, with the participants using their own computers. Every participant had JAWS screen reader and Google Chrome browser installed on their computers. The iTOC extension was emailed to the participants (as a Google Drive link), and the experimenter assisted them in downloading and installing the extension by providing instructions via a conferencing software. Two participants (P6, P9) needed help from their family members to install the iTOC extension. Zoom or Skype conferencing software were used for communication and screen-sharing, and the entire session was recorded with the participants’ permission.
C. Design
In a within-subject experimental setup, the participants performed representative ‘document navigation’ tasks under the following study conditions:
Screen Reader: the participants could only use the JAWS screen reader to complete the assigned tasks. This condition represents the status quo.
SaIL: the participants could use a state-of-the-art annotation model [8] along with the JAWS screen reader to complete the tasks. This model uses a visual-saliency neural network to identify important sections of the page, and automatically injects WAI-ARIA landmarks into the HTML source of the page, so that users can use the landmark shortcut ‘R’ to navigate to these sections.
iTOC: the participants could use the proposed iTOC along with the JAWS screen reader to complete the tasks.
For convenience, the SaIL model was included into the iTOC extension and could be turned ‘ON/OFF’ with a keyboard shortcut. When the SaIL model was turned ‘ON’, the iTOC feature was disabled, and vice versa. Therefore, both SaIL and iTOC study conditions could be simulated by a single iTOC browser extension.
In each of the above conditions, the participants were asked to perform the following two tasks, thereby totaling 6 tasks overall per participant:
T1: Find the information about two pre-specified topics in a wiktionary.
T2: Find the answers to two pre-specified questions in a blog webpage.
To avoid learning effect, different websites were used in each condition for both T1 and T2. For T1, we selected the Fandom4, PCGamingWiki5, and Gamepedia6 websites; and for T2, we chose the WordPress7, Craft of Blogging8, and Fat Stacks Blog9 websites. These choices were made to ensure comparability of all tasks within T1 and T2. The assignment of websites to conditions as well as the ordering of tasks and conditions for each participant was randomized (counterbalanced) using the Latin Square method [26] to further minimize learning effect. Also, to avoid any unforeseen issues, we used cached version of the webpages for the study.
D. Procedure
Before starting the tasks, the experimenter assisted the participant in downloading and installing the extension, and also turning ON/OFF the study conditions. The participants were also given enough practice time (~10 minutes) to familiarize themselves with both the SaIL and iTOC study conditions. The participants were assigned 15 minutes to complete each task. The questionnaires were administered after the participants completed all study tasks. The screen-sharing and the recording features of conferencing software were turned on to ensure that all user-interaction activities were captured for later analysis. Each study lasted for 2.5 hours, and all conversations during the study were in English.
Measurements.
During the study, apart from screen recording, we also remotely logged all screen-reader keystrokes. From this collected data, we measured task completion times and the number of shortcut presses for each study condition. At the end of the study, we administered the System Usability Scale (SUS) [27], NASA Task Load Index (NASA-TLX) [28], and an exit interview to collect subjective feedback.
E. Results
1). Task Completion Times and Number of Shortcuts:
Figure 3 compares the statistics for both the task completion times and the number of shortcut presses for all three study conditions. As observable in Figure 3a, for the task T1, on average, the participants were able to perform much faster with iTOC (μ = 156.46, med = 143, min = 92, max = 286) than with just the screen reader (μ = 381.93, med = 348, min = 225, max = 622). For T1, iTOC even yielded better task completion times than the state-of-the-art SaIL annotation technique (μ = 270.8, med = 238, min = 154, max = 523). From Figure 3b, similar observations can be made regarding the number of shortcut presses for T1 in each condition: μ = 270.53, med = 240, min = 99, max = 495 for screen reader; μ = 201.86, med = 167, min = 77, max = 462 for SaIL; and μ = 112.33, med = 98, min = 44, max = 241 for iTOC. All of these differences in performance values between conditions was found to be statistically significant (Table IV).
Fig. 3.

Evaluation results from the user study for two tasks (T1 and T2).
TABLE IV.
Kruskal-Wallis test for statistical significance between study conditions.
| Task | Completion Time | Number of Shortcuts |
|---|---|---|
| T1 |
H = 25:35, df = 2, p < 0:0001 |
H = 16:25, df = 2, p = 0:0003 |
| T2 |
H = 20:51, df = 2, p < 0:0001 |
H = 17:12, df = 2, p = 0:0002 |
The differences in the performance values between conditions for T1 can be attributed to several factors. In the screen reader condition, nine participants used the TOC on the page only once to navigate to the relevant section containing the information about the first topic of the tasks. To find the required information about the second topic, these participants manually searched for the corresponding relevant section using screen-reader shortcuts. The remaining six participants pressed CTRL+HOME shortcut to shift focus back to the top of the page after finding the desired information about the first topic, and then manually navigated to the TOC again from the top, to access the link for the section corresponding to the second topic. In either case, they spent a considerable amount of time and shortcut presses, due to manual screen-reader navigation. Only two participants (P6, P12) tried using the CTRL+F shortcut, but were unsuccessful in searching for the target information. This observation agrees with findings of a recent study [29], which showed that screen-reader users rarely use the CTRL+F keyboard shortcut.
With SaIL, all participants initially exploited the ‘R’ shortcut to navigate to the TOC faster from the top of the page. However, a majority of these participants (10) did not subsequently use this shortcut to navigate back to the TOC after finding the information for the first topic of task. Instead, they started searching for the relevant section for the second topic on their own using other familiar shortcuts. However, these participants eventually recollected the availability of the ‘R’ shortcut, and then switched to this shortcut to either navigate back to the TOC or find the section for the second topic. The remaining five participants relied on the ‘R’ shortcut throughout the execution of the task. In either case, SaIL labeled many sections (e.g., top menu, right panel summary, left panel menus, etc.) in addition to the TOC sections as important, and therefore, the participants had to navigate over these additional irrelevant sections while looking for the desired information.
With iTOC, only one-third of participants (5) initially forgot to use the ‘C’ shortcut to instantly access the TOC. However, they eventually remembered to use the shortcut and access the TOC to navigate to the section about the second topic.
The performance of iTOC was better than the other two conditions for the task T2 as well. Specifically, the average task completion time for iTOC was 133.93 (med = 124, min = 92, max = 211), whereas the average completion times for screen reader and SaIL were 308.6 (med = 274, min = 153, max = 571) and 233.86 (med = 202, min = 119, max = 490), respectively (Figure 3c). The average number of shortcut presses for the screen reader, SaIL, and iTOC conditions were 222.13 (med = 190, min = 63, max = 478), 173.73 (med = 149, min = 60, max = 374), and 73.06 (med = 63, min = 24, max = 131), respectively (Figure 3d). As shown in Table IV, these differences between conditions were statistically significant. The observations for T2 were similar to those of T1 in that most participants manually searched for the section containing the answer to the second question of the task in the screen reader condition, and also to some extent in the SaIL condition, whereas only a few participants (3) did not straightaway access the iTOC’s TOC while searching for answer to the second question.
2). Subjective Evaluation:
System Usability Scale (SUS).
We administered the standard SUS questionnaire [27] at the end of the study, where the participants rated positive and negative statements about each study condition on a Likert scale from 1 for strongly disagree to 5 for strongly agree, with 3 being neutral. Overall, we found a significant difference in the SUS scores between the three study conditions: screen reader (μ = 58.83, σ = 11.17), SaIL (μ = 75.16, σ = 18.06), and iTOC (μ = 85.66, σ = 7.55) conditions (one-way ANOVA, F = 15.11, p < 0.0001). However, between the SaIL and iTOC conditions, the difference in scores was not found to be statistically significant (Post-hoc Tukey HSD test, p = 0.09). The other two comparisons (i.e., screen reader vs. SaIL, and screen reader vs. iTOC) were found to be statistically significant (Tukey’s HSD test, p < 0.0001).
NASA Task Load Index (NASA-TLX).
To assess the perceived task workload, we administered the NASA-TLX questionnaire [28] that measures workload as a value between 0 and 100, with lower values indicating better results. Overall, we found a significant difference in the scores between the three conditions: screen reader (μ = 65.11, σ = 9.08), SaIL (μ = 42.91, σ = 7.42), and iTOC (μ = 25.37, σ = 6.21) conditions (one-way ANOVA, F = 94.41, p < 0.001). Also, the pairwise comparisons between all conditions were all found to be statistically significant (Tukey’s HSD, p < 0.01).
3). Qualitative Feedback:
Four participants (P2, P8, P9, P13) expressed a desire for hierarchical navigation of lengthy TOC. Specifically, they stated that there are many webpages where the TOC is hierarchical with multiple layers of navigational hyperlinks. They stated that in such cases, serial navigation over the TOC is tedious, and therefore hierarchical navigation will help skip a lot of irrelevant links and save time. P8 and P13 also stated that other lists of hyperlinks such as side-panel menus can be sometimes as important as TOC. To handle this, they advocated generalizing iTOC to enable the users to quickly go over the different lists of hyperlinks on the page and pick the desired list.
All participants were unfamiliar with the study webpages. Five participants (P1, P6, P9, P13, P15) stated that unlike e-commerce and social media websites such as shopping, flight-reservation, news, Twitter, etc., they typically rarely visit the same wiktionary, manual, blog, or novel page multiple times. Therefore, they said that they do not have the chance to get very familiar with webpage layouts and devise efficient navigational strategies, like they typically do with e-commerce and social media websites. As a consequence, they have to exert more effort and time to interact with long web documents. They expressed that having instant access to TOC as provided by iTOC can therefore reduce the interaction burden.
Eight participants (P1, P3, P4, P5, P10, P11, P12, P14) suggested adding an alternative input method to access the TOC instead of keyboard. They stated that the shortcut to access the TOC (Insert+3, followed by ‘C’ in JAWS) was slightly complex, and that remembering this complex shortcut in addition to the plethora of shortcuts supported by the JAWS screen reader can be a burden.
V. Discussion
The results and positive feedback from the user study clearly demonstrate the potential of iTOC in significantly improving the user experience of blind screen-reader users while interacting with long web documents. However, the study also illuminated some of the limitations of iTOC as well as user expectations and requirements regarding iTOC, some of which are discussed next.
A. Limitations
From the study findings, it is apparent that one of the limitations of iTOC is the need to remember and execute a slightly complex keyboard shortcut. Unfortunately, the shortcut had to be designed that way in order to make it work with screen readers (e.g., JAWS requires users to first press Insert+3 before pressing custom application shortcuts). Although the participants did not face problems pressing this shortcut during the study, they did express the need for an alternative simpler method to access the TOC. Another limitation observed from the study is that the current iTOC does not remember the previous ‘state’, and therefore requires the users to navigate the TOC starting from the beginning every time it is accessed. As noted by some participants, this can lead to redundant navigation over undesired hyperlinks in the TOC, thereby increasing the access overhead. Lastly, the current iTOC only supports instant access to the TOC, however there are many other lists, such as filter options, sort options, menus, etc., on webpages that are also important for screen-reader users. While adding the capability of maintaining state information in the iTOC is straightforward and is the scope of future work, addressing the other limitations requires further research as described next.
B. Alternative Gestural Access to TOC
An alternative method to access the iTOC’s TOC instead of using the keyboard can clearly separate standard keyboard navigation and instantaneous TOC access. A feasible approach to achieve this separation is to repurpose the existing pointing devices (e.g., mouse, touchpad), such that blind users can use simple mouse/touchpad actions to bring up and interact with the TOC provided by iTOC. For example, the users can do a middle-click mouse action to bring up the TOC, and then perform a scroll action to navigate the TOC hyperlinks one-by-one, followed by a left-click on the desired hyperlink to instantly navigate to the corresponding document section.
C. Generalizing iTOC for Multiple Supporting Segments
A support segment can be defined as a page segment that enables users to conveniently interact (in some way) with the main content on the webpage. A TOC can be considered as a support segment that facilitates instant access to the different sections of the main content. Likewise, different webpages can have different types of support segments (e.g., filtering options in e-commerce websites such as flight reservations, shopping, etc.) that are instantly accessible to sighted users via pointing devices, but require tedious serial keyboard-driven navigation for blind screen-reader users. Therefore, automatically identifying these support segments and then subsequently making them instantly accessible in the iTOC interface can significantly improve the user experience of blind users. In this regard, we can also exploit prior work [30], [31] to facilitate convenient interaction with generic hierarchical TOC covering multiple support segments.
VI. Conclusion
This paper introduced iTOC, a browser extension for assisting blind users instantly access a Table of Contents in long web documents at any given time, without having to manually navigate to it using sequential screen-reader shortcuts. Therefore, iTOC mitigates the interaction burden of blind users by compensating for their inability to rely on pointing devices such as a mouse and a touchpad while interacting with long web documents. A user study with 15 blind participants demonstrated the potential of iTOC in significantly improving usability of web interaction with long documents, and also provided insights into how iTOC can be generalized to support instantaneous access to multiple support segments such as search filters, sort options, menus, etc., in modern webpages.
Acknowledgments
This work was supported by NSF Awards: 1805076, 1936027, NIH Awards: R01EY026621, R01EY030085, R01HD097188, NIDILRR Award: 90IF0117-01-00.
Footnotes
References
- [1].Mindel JL and Verma S, “Wikis for teaching and learning,” Communications of the Association for Information Systems, vol. 18, no. 1, p. 1, 2006. [Google Scholar]
- [2].Shu W and Chuang Y-H, “Wikis as an effective group writing tool: a study in taiwan,” Online Information Review, 2012. [Google Scholar]
- [3].Scientific F, “Jaws ® – freedom scientific,” http://www.freedomscientific.com/products/software/jaws/, 2020.
- [4].Inc. A, “Vision accessibility - mac - apple,” https://www.apple.com/accessibility/mac/vision/, 2020.
- [5].Access N, “Nv access,” https://www.nvaccess.org/, 2020.
- [6].Takagi H, Saito S, Fukuda K, and Asakawa C, “Analysis of navigability of web applications for improving blind usability,” ACM Trans. Comput.-Hum. Interact, vol. 14, no. 3, p. 13–es, Sep. 2007. [Online]. Available: 10.1145/1279700.1279703 [DOI] [Google Scholar]
- [7].Brown A and Harper S, “Dynamic injection of wai-aria into web content,” in Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility, ser. W4A ‘13. New York, NY, USA: Association for Computing Machinery, 2013. [Online]. Available: 10.1145/2461121.2461141 [DOI] [Google Scholar]
- [8].Aydin AS, Feiz S, Ashok V, and Ramakrishnan I, “Sail: Saliency-driven injection of aria landmarks,” in Proceedings of the 25th International Conference on Intelligent User Interfaces, ser. IUI ‘20. New York, NY, USA: Association for Computing Machinery, 2020, p. 111–115. [Online]. Available: 10.1145/3377325.3377540 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Billah SM, Ashok V, Porter DE, and Ramakrishnan I, “Speed-dial: A surrogate mouse for non-visual web browsing,” in Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility, ser. ASSETS ‘17. New York, NY, USA: Association for Computing Machinery, 2017, p. 110–119. [Online]. Available: 10.1145/3132525.3132531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Álvarez M, Pan A, Raposo J, Bellas F, and Cacheda F, “Finding and extracting data records from web pages,” Journal of Signal Processing Systems, vol. 59, no. 1, pp. 123–137, 2010. [Google Scholar]
- [11].Melnyk V, Ashok V, Puzis Y, Soviak A, Borodin Y, and Ramakrishnan IV, “Widget classification with applications to web accessibility,” in Web Engineering, Casteleyn S, Rossi G, and Winckler M, Eds. Cham: Springer International Publishing, 2014, pp. 341–358. [Google Scholar]
- [12].Alarte J, Insa D, and Silva J, “Webpage menu detection based on dom,” in International Conference on Current Trends in Theory and Practice of Informatics. Springer, 2017, pp. 411–422. [Google Scholar]
- [13].Marinai S, Marino E, and Soda G, “Table of contents recognition for converting pdf documents in e-book formats,” in Proceedings of the 10th ACM symposium on Document engineering, 2010, pp. 73–76. [Google Scholar]
- [14].Klampfl S and Kern R, “An unsupervised machine learning approach to body text and table of contents extraction from digital scientific articles,” in International Conference on Theory and Practice of Digital Libraries. Springer, 2013, pp. 144–155. [Google Scholar]
- [15].Wu Z, Mitra P, and Giles CL, “Table of contents recognition and extraction for heterogeneous book documents,” in 2013 12th International Conference on Document Analysis and Recognition. IEEE, 2013, pp. 1205–1209. [Google Scholar]
- [16].Asakawa C and Takagi H, “Annotation-based transcoding for nonvisual web access,” in Proceedings of the fourth international ACM conference on Assistive technologies, 2000, pp. 172–179. [Google Scholar]
- [17].Bechhofer S, Harper S, and Lunn D, “Sadie: Semantic annotation for accessibility,” in Proceedings of the 5th International Conference on The Semantic Web, ser. ISWC’06. Berlin, Heidelberg: Springer-Verlag, 2006, p. 101–115. [Online]. Available: 10.1007/11926078_8 [DOI] [Google Scholar]
- [18].Bolin M, Webber M, Rha P, Wilson T, and Miller RC, “Automation and customization of rendered web pages,” in Proceedings of the 18th annual ACM symposium on User interface software and technology, 2005, pp. 163–172. [Google Scholar]
- [19].Montoto P, Pan A, Raposo J, Bellas F, and López J, “Automating navigation sequences in ajax websites,” in International Conference on Web Engineering. Springer, 2009, pp. 166–180. [Google Scholar]
- [20].Bigham JP, Lau T, and Nichols J, “Trailblazer: Enabling blind users to blaze trails through the web,” in Proceedings of the 14th International Conference on Intelligent User Interfaces, ser. IUI ‘09. New York, NY, USA: Association for Computing Machinery, 2009, p. 177–186. [Online]. Available: 10.1145/1502650.1502677 [DOI] [Google Scholar]
- [21].Gadde P and Bolchini D, “From screen reading to aural glancing: towards instant access to key page sections,” in Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility, 2014, pp. 67–74. [Google Scholar]
- [22].Ashok V, Puzis Y, Borodin Y, and Ramakrishnan I, “Web screen reading automation assistance using semantic abstraction,” in Proceedings of the 22nd International Conference on Intelligent User Interfaces, ser. IUI ‘17. New York, NY, USA: Association for Computing Machinery, 2017, p. 407–418. [Online]. Available: 10.1145/3025171.3025229 [DOI] [Google Scholar]
- [23].Puzis Y, Borodin Y, Puzis R, and Ramakrishnan I, “Predictive web automation assistant for people with vision impairments,” in Proceedings of the 22nd International Conference on World Wide Web, ser. WWW ‘13. New York, NY, USA: Association for Computing Machinery, 2013, p. 1031–1040. [Online]. Available: 10.1145/2488388.2488478 [DOI] [Google Scholar]
- [24].Li I, Nichols J, Lau T, Drews C, and Cypher A, “Here’s what i did: Sharing and reusing web activity with actionshot,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ‘10. New York, NY, USA: Association for Computing Machinery, 2010, p. 723–732. [Online]. Available: 10.1145/1753326.1753432 [DOI] [Google Scholar]
- [25].Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, and Duchesnay E, “Scikit-learn: Machine learning in python,” J. Mach. Learn. Res, vol. 12, no. null, p. 2825–2830, Nov. 2011. [Google Scholar]
- [26].Bradley JV, “Complete counterbalancing of immediate sequential effects in a latin square design,” Journal of the American Statistical Association, vol. 53, no. 282, pp. 525–528, 1958. [Google Scholar]
- [27].Brooke J et al. , “Sus-a quick and dirty usability scale,” Usability evaluation in industry, vol. 189, no. 194, pp. 4–7, 1996. [Google Scholar]
- [28].Hart SG and Staveland LE, “Development of nasa-tlx (task load index): Results of empirical and theoretical research,” in Human Mental Workload, ser. Advances in Psychology, Hancock PA and Meshkati N, Eds. North-Holland, 1988, vol. 52, pp. 139–183. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0166411508623869 [Google Scholar]
- [29].WebAIM, “Screen reader user survey #8 results - webaim,” https://webaim.org/projects/screenreadersurvey8/, 2020. [Google Scholar]
- [30].Chimera R and Shneiderman B, “An exploratory evaluation of three interfaces for browsing large hierarchical tables of contents,” ACM Trans. Inf. Syst, vol. 12, no. 4, p. 383–406, Oct. 1994. [Online]. Available: 10.1145/185462.185483 [DOI] [Google Scholar]
- [31].Shneiderman B, Feldman D, and Rose A, “Webtoc: a tool to visualize and quantify web sites using a hierarchical table of contents,” Technical Report CS-TR-3992, 1999. [Google Scholar]

