Vis enkel innførsel

dc.contributor.advisorIngwersen, Peter
dc.contributor.authorThøgersen, Rasmus
dc.date.accessioned2012-11-01T07:48:13Z
dc.date.available2012-11-01T07:48:13Z
dc.date.issued2012
dc.identifier.urihttps://hdl.handle.net/10642/1267
dc.descriptionJoint Master Degree in Digital Library Learning (DILL)en_US
dc.description.abstractOne way to address the challenge of creating metadata for digitized image collections is to rely on user-created index terms, typically by harvesting tags from the collaborative information services known as folksonomies or by allowing the users to tag directly in the catalog. An alternative method, only recently applied in cultural heritage institutions, is Human Computation Games, a crowdsourcing tool that relies on user-agreement to create valid tags. This study contributes to the research by investigating tags (at various degrees of validation) generated by a Human Computation Game and comparing them to descriptors assigned to the same images by professional indexers. The analysis is done by classifying tags and descriptors by term-category, as well as by measuring overlap on both syntactic (matching on terms) and semantic (matching on meaning) level between the tags and the descriptors. The findings shows that validated tags tend to describe ‘artifacts/objects’ and that game-generated tags typically will represent what is in the picture, rather than what it is about. Descriptors also primarily belonged to this term-category but also had a substantial amount of ‘Proper nouns’, mainly named locations. Tags generated by the game, not validated by player-agreement, had a higher frequency of ‘subjective/narrative’ tags, but also more errors. It was determined that the exact (character-for-character) overlap i.e. the number of common terms compared to the entire pool of tags and descriptors was slightly less than 5% for all types of tags. By extending the analysis to include fuzzy (word-stem) matching, the overlap more than doubled. The semantic overlap was established with thesaurus relations between a sample of tags and descriptors and adapting this - more inclusive - view of overlap resulted in an increase in percentage of tags that were matched to descriptors. More than half of the validated tags had some thesaurus relation to a descriptor added by a professional indexer. Approximately 60% of the thesaurus relations between descriptors and valid tags were either ‘same’ or ‘equivalent’ and roughly 20% were associative and 20% were hierarchical. For the hierarchical relations it was found that tags typically describe images at a less specific level than descriptors.en_US
dc.language.isoengen_US
dc.publisherHøgskolen i Oslo og Akershus. Institutt for arkiv, bibliotek- og info.fagen_US
dc.publisherUniversitetet i Tallinnen_US
dc.publisherUniversitetet i Parmaen_US
dc.subjectFolksonomiesen_US
dc.subjectTagsen_US
dc.subjectHuman computation gamesen_US
dc.subjectCrowdsourcingen_US
dc.subjectImage indexingen_US
dc.subjectMetadataen_US
dc.subjectVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551en_US
dc.subjectVDP::Samfunnsvitenskap: 200::Biblioteks- og informasjonsvitenskap: 320::Informasjons- og kommunikasjonssystemer: 321en_US
dc.titleCrowdsourcing for image metadata : a comparison between game-generated tags and professional descriptorsen_US
dc.typeMaster thesisen_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel