Named entities in indexing: A case study of TV subtitles and metadata records

This paper explores the possible role of named entities in an automatic indexing process, based on text in subtitles. This is done by analyzing entity types, name density and name frequencies in subtitles and metadata records from different TV programs. The name density in metadata records is much higher than the name density in subtitles, and named entities with high frequencies in the subtitles are more likely to be mentioned in the metadata records. Personal names, geographical names and names of organizations where the most prominent entity types in both the news subtitles and news metadata, while persons, works and locations are the most prominent in culture programs.

Publisher

CEUR Workshop Proceedings