From subtitles to substantial metadata: examining characteristics of named entities and their role in indexing

Husevåg, Anne-Stine Ruud

dc.contributor.author	Husevåg, Anne-Stine Ruud
dc.date.accessioned	2018-12-19T11:39:42Z
dc.date.accessioned	2019-01-11T07:57:39Z
dc.date.available	2018-12-19T11:39:42Z
dc.date.available	2019-01-11T07:57:39Z
dc.date.issued	2018-10-16
dc.identifier.citation	Husevåg AS. From subtitles to substantial metadata: examining characteristics of named entities and their role in indexing. International Journal on Digital Libraries. 2018	en
dc.identifier.issn	1432-5012
dc.identifier.issn	1432-5012
dc.identifier.issn	1432-1300
dc.identifier.uri	https://hdl.handle.net/10642/6491
dc.description.abstract	This paper explores the possible role of named entities extracted from text in subtitles in automatic indexing of TV-programs. This is done by analyzing entity types, name density and name frequencies in subtitles and metadata records from different genres of TV programs. The name density in metadata records is much higher than the name density in subtitles, and named entities with high frequencies in the subtitles are more likely to be mentioned in the metadata records. Further analysis of the metadata records indicate an increase in use of named entities in metadata in accordance with the frequency the entities have in the subtitles. The most substantial difference was between a frequency of one or two, where the named entities with a frequency of two in the subtitles where twice as likely to be present in the metadata records. Personal names, geographical names and names of organizations were the most prominent entity types in both the news subtitles and news metadata, while persons, creative works and locations are the most prominent in culture programs. It is not possible to extract all the named entities in the manually created metadata records by applying named entity recognition to the subtitles for the same programs, but it is possible to find a large subset of named entities for some categories in certain genres. The results reported in this paper show that subtitles are a good source for personal names for all the genres covered in our study, and for creative works in literature programs. In total, it was possible to find 38% of the named entities in metadata records for news programs, 32% for literature programs, while 21% of the named entities in metadata records for talk shows were also present in the subtitles for the programs.	en
dc.language.iso	en	en
dc.publisher	Springer Verlag	en
dc.relation.ispartofseries	International Journal on Digital Libraries; September 2019, Volume 20, Issue 3
dc.rights	This is a post-peer-review, pre-copyedit version of an article published in International Journal on Digital Libraries. The final authenticated version is available online at: http://dx.doi.org/10.1007/s00799-018-0252-z.	en
dc.subject	Named entity recognitions	en
dc.subject	Multimedia indexing	en
dc.subject	Metadata	en
dc.subject	Audiovisual archives	en
dc.title	From subtitles to substantial metadata: examining characteristics of named entities and their role in indexing	en
dc.type	Journal article
dc.type	Journal article	en
dc.type	Peer reviewed	en
dc.date.updated	2018-12-19T11:39:42Z
dc.description.version	acceptedVersion	en
dc.identifier.doi	http://dx.doi.org/10.1007/s00799-018-0252-z
dc.identifier.cristin	1644786
dc.source.journal	International Journal on Digital Libraries

Tilhørende fil(er)

Filnavn:: IJDL180706referansefix.pdf
Størrelse:: 1.317Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

SAM - Institutt for arkiv, bibliotek og informasjonsvitenskap [319]
SAM - Department of Archivistics, Library and Information Science

Vis enkel innførsel