A neural network model and framework for an automatic evaluation of image descriptions based on NCAM image accessibility guidelines
Chapter, Peer reviewed
Published version
Permanent lenke
https://hdl.handle.net/11250/2996980Utgivelsesdato
2021Metadata
Vis full innførselSamlinger
Sammendrag
Millions of people who are either blind or visually impaired have difficulty understanding the content in an image. To address the problem textual image descriptions or captions are provided separately or as alternative texts on the web so that the users can read them through a screen reader. However, most of the image descriptions provided are inadequate to make them accessible enough. Image descriptions could be written either manually or automatically generated using software tools. There are tools, methods, and metrics used to evaluate the quality of the generated text. However, almost all of them are word-similarity-based and generic. Even though there are standard guidelines such as WCAG2.0 and NCAM image accessibility guidelines, they are rarely used in the evaluation of image descriptions. In this paper, we propose a neural network-based framework and models for an automatic evaluation of image descriptions in terms of compliance with the NCAM guidelines. A custom dataset was created from a widely used Flickr8K dataset to train and test the models. The experimental results show the proposed framework performing very well with an average accuracy of above 98%. We believe that the framework could be helpful and useful for the authors of image descriptions in writing accessible image descriptions for the users.