. 2022 Dec 2;22:317. doi: 10.1186/s12911-022-02064-5

Table 4.

Data collection methods for usability testing

Data collection method n (%)	Description	Metrics/tools	Comments
Performance metrics 25 (26.1))	Collecting quantifiable measurements of participants’ actions during the test to understand the impacts of usability issues, usually focusing on effectiveness and efficiency	Effectiveness: number of errors, number of tasks that can be completed successfully; efficiency: task duration, number of times asking for assistance or hints, time spent recovering from errors	These quantitative indicators can be compared in young adults and seniors to reflect differences in performance [76, 81]
Behavior observation log 14 (14.6)	Observing and recording the participant’s mood and body gestures during the test	Sometimes, the observation is structured and based on predefined classifications of user behavior, such as delay or pause of > 5 s in locating the answer button [42]	This method is often used in conjunction with thinking aloud and performance metrics to improve triangulation [33]
Screen recording 3 (3.1)	Capturing the touches and actions performed on the mobile device	Screen recording software and video coding software (Behavioral Observation Research Interactive Software)	–
Eye tracking 1 (1.1)	Monitoring and recording the visual activity of the participants by tracing pupil movement within the eye	Fixations: the number of views of the area of interest; Saccade: the number of repeated visits to the specific area [52]	Because of the drooping eyelids of elderly individuals, the eye tracker may not scan their pupil accurately
Concurrent thinking aloud 25 (26.1)	Encouraging the participants to continuously verbalize their ideas, beliefs, expectations, doubts, and discoveries while performing tasks in order to understand their thoughts as they interact with the app	–	This method relies heavily on the cognitive capacities of participants, whereas these capacities decline with age; thus, it may cause reporter bias [83]
Retrospective thinking aloud 1 (1.1)	Asking the participants to view the recording of their actions and verbalize their thoughts about the tasks and the difficulties they encountered in completing the tasks	–	1. This method will increase the overall length of the evaluation and may cause the elderly to lose focus [85] 2. This method will not increase the cognitive load of the elderly compared with concurrent thinking aloud [83]
Questionnaire 68 (70.8)	Gathering the participants’ opinions about, preferences for and satisfaction with the user interface on a predefined scale after they completed the tasks	Validated questionnaires: SUS, USE, UEQ, ASQ, NASA-TLX, NPS, Health-ITUES, QUIS, PSSUQ, ICF-US, MARS, Ruland’s eight-item adaptation of Davis' ease-of-use survey, self-made questionnaires according to the unique features of a specific app	1. A larger sample size can be investigated by this method [78] 2. Some items have to be answered by an expert rather than the elderly because they are either beyond the scope of the test or based on experiencing rare occurrences. [85] 3. To prevent the response burden of the elderly and improve the understandability of the questionnaire, some items are removed or the language is modified [43, 54, 82]
Interview 36 (37.5)	Collecting the data in the form of face-to-face oral conversations with the participants, including individual interviews and focus group interviews	The interview outline: opinions on unique features, product satisfaction, and difficulties encountered during the test as well as suggestions for improvement	1. This method can obtain more new insights from the participants 2. This method is often combined with a questionnaire to collect the explanations of answers to the questionnaire
Feedback log 1 (1.1)	Asking the participants to record their experiences on a provided form when using the app	–	This method is suitable for long-term usability testing, as it can record the participant’s experience dynamically [52]

SUS, System Usability Scale [86]; USE, Usefulness Satisfaction and Ease of Use Questionnaire [87]; UEQ, User Experience Questionnaire [88]; ASQ, After Scenario Questionnaire [89]; NASA-TLX, National Aeronautics and Space Administration Task Load Index [90]; NPS, Net Promoter Score [78]; Health-ITUES, Health Information Technology Usability Evaluation Scale [91]; QUIS, Questionnaire for User Interaction Satisfaction [92]; PSSUQ, Post-Study System Usability Questionnaire [93]; ICF-US, International Classification of Functioning based Usability Scale [94]; MARS, Mobile Application Rating Scale [95]