Crowdsourcing for image metadata : a comparison between game-generated tags and professional descriptors

Thøgersen, Rasmus

dc.contributor.advisor	Ingwersen, Peter
dc.contributor.author	Thøgersen, Rasmus
dc.date.accessioned	2012-11-01T07:48:13Z
dc.date.available	2012-11-01T07:48:13Z
dc.date.issued	2012
dc.identifier.uri	https://hdl.handle.net/10642/1267
dc.description	Joint Master Degree in Digital Library Learning (DILL)	en_US
dc.description.abstract	One way to address the challenge of creating metadata for digitized image collections is to rely on user-created index terms, typically by harvesting tags from the collaborative information services known as folksonomies or by allowing the users to tag directly in the catalog. An alternative method, only recently applied in cultural heritage institutions, is Human Computation Games, a crowdsourcing tool that relies on user-agreement to create valid tags. This study contributes to the research by investigating tags (at various degrees of validation) generated by a Human Computation Game and comparing them to descriptors assigned to the same images by professional indexers. The analysis is done by classifying tags and descriptors by term-category, as well as by measuring overlap on both syntactic (matching on terms) and semantic (matching on meaning) level between the tags and the descriptors. The findings shows that validated tags tend to describe ‘artifacts/objects’ and that game-generated tags typically will represent what is in the picture, rather than what it is about. Descriptors also primarily belonged to this term-category but also had a substantial amount of ‘Proper nouns’, mainly named locations. Tags generated by the game, not validated by player-agreement, had a higher frequency of ‘subjective/narrative’ tags, but also more errors. It was determined that the exact (character-for-character) overlap i.e. the number of common terms compared to the entire pool of tags and descriptors was slightly less than 5% for all types of tags. By extending the analysis to include fuzzy (word-stem) matching, the overlap more than doubled. The semantic overlap was established with thesaurus relations between a sample of tags and descriptors and adapting this - more inclusive - view of overlap resulted in an increase in percentage of tags that were matched to descriptors. More than half of the validated tags had some thesaurus relation to a descriptor added by a professional indexer. Approximately 60% of the thesaurus relations between descriptors and valid tags were either ‘same’ or ‘equivalent’ and roughly 20% were associative and 20% were hierarchical. For the hierarchical relations it was found that tags typically describe images at a less specific level than descriptors.	en_US
dc.language.iso	eng	en_US
dc.publisher	Høgskolen i Oslo og Akershus. Institutt for arkiv, bibliotek- og info.fag	en_US
dc.publisher	Universitetet i Tallinn	en_US
dc.publisher	Universitetet i Parma	en_US
dc.subject	Folksonomies	en_US
dc.subject	Tags	en_US
dc.subject	Human computation games	en_US
dc.subject	Crowdsourcing	en_US
dc.subject	Image indexing	en_US
dc.subject	Metadata	en_US
dc.subject	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551	en_US
dc.subject	VDP::Samfunnsvitenskap: 200::Biblioteks- og informasjonsvitenskap: 320::Informasjons- og kommunikasjonssystemer: 321	en_US
dc.title	Crowdsourcing for image metadata : a comparison between game-generated tags and professional descriptors	en_US
dc.type	Master thesis	en_US

Tilhørende fil(er)

Filnavn:: Thoegersen_Rasmus.pdf
Størrelse:: 5.463Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

SAM - Joint Master Degree in Digital Library Learning (DILL) [78]

Vis enkel innførsel