Vis enkel innførsel

dc.contributor.authorHusevåg, Anne-Stine Ruud
dc.date.accessioned2018-12-19T11:39:42Z
dc.date.accessioned2019-01-11T07:57:39Z
dc.date.available2018-12-19T11:39:42Z
dc.date.available2019-01-11T07:57:39Z
dc.date.issued2018-10-16
dc.identifier.citationHusevåg AS. From subtitles to substantial metadata: examining characteristics of named entities and their role in indexing. International Journal on Digital Libraries. 2018en
dc.identifier.issn1432-5012
dc.identifier.issn1432-5012
dc.identifier.issn1432-1300
dc.identifier.urihttps://hdl.handle.net/10642/6491
dc.description.abstractThis paper explores the possible role of named entities extracted from text in subtitles in automatic indexing of TV-programs. This is done by analyzing entity types, name density and name frequencies in subtitles and metadata records from different genres of TV programs. The name density in metadata records is much higher than the name density in subtitles, and named entities with high frequencies in the subtitles are more likely to be mentioned in the metadata records. Further analysis of the metadata records indicate an increase in use of named entities in metadata in accordance with the frequency the entities have in the subtitles. The most substantial difference was between a frequency of one or two, where the named entities with a frequency of two in the subtitles where twice as likely to be present in the metadata records. Personal names, geographical names and names of organizations were the most prominent entity types in both the news subtitles and news metadata, while persons, creative works and locations are the most prominent in culture programs. It is not possible to extract all the named entities in the manually created metadata records by applying named entity recognition to the subtitles for the same programs, but it is possible to find a large subset of named entities for some categories in certain genres. The results reported in this paper show that subtitles are a good source for personal names for all the genres covered in our study, and for creative works in literature programs. In total, it was possible to find 38% of the named entities in metadata records for news programs, 32% for literature programs, while 21% of the named entities in metadata records for talk shows were also present in the subtitles for the programs.en
dc.language.isoenen
dc.publisherSpringer Verlagen
dc.relation.ispartofseriesInternational Journal on Digital Libraries; September 2019, Volume 20, Issue 3
dc.rightsThis is a post-peer-review, pre-copyedit version of an article published in International Journal on Digital Libraries. The final authenticated version is available online at: http://dx.doi.org/10.1007/s00799-018-0252-z.en
dc.subjectNamed entity recognitionsen
dc.subjectMultimedia indexingen
dc.subjectMetadataen
dc.subjectAudiovisual archivesen
dc.titleFrom subtitles to substantial metadata: examining characteristics of named entities and their role in indexingen
dc.typeJournal article
dc.typeJournal articleen
dc.typePeer revieweden
dc.date.updated2018-12-19T11:39:42Z
dc.description.versionacceptedVersionen
dc.identifier.doihttp://dx.doi.org/10.1007/s00799-018-0252-z
dc.identifier.cristin1644786
dc.source.journalInternational Journal on Digital Libraries


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel