Skip to main content
. 2022 Aug 22:1–21. Online ahead of print. doi: 10.1007/s10209-022-00906-7

Table 4.

Overview of the experiment

Goal Analyze and compare the image textual descriptions generated by the vision engines described in Sect. 3 to understand if there are differences in terms of their perceived correctness and analyze the differences with respect to human-authored descriptions.
Research questions RQ1 Is there any difference in the perceived correctness among the descriptions generated by the considered tools?
RQ2 Is there any difference in the perceived correctness between the ground truth descriptions provided by humans and those provided by the tools?
Context Objects: descriptions of 60 images selected from Wikipedia covering the three categories Human, Landmark, and General.
Subjects: 76 students in computer science.
Null hypothesis No effect on correctness (measured with a 5-point Likert scale).
Treatments Five: Wikipedia (manual), Azure Computer Vision Engine, Amazon Rekognition, Cloudsight, and Auto Alt-Text for Google Chrome.
Dependent var Perceived Correctness of the description with respect to the corresponding image.