Vis enkel innførsel

dc.contributor.authorHammer, Hugo Lewi
dc.contributor.authorBratterud, Alfred
dc.contributor.authorFagernes, Siri
dc.date.accessioned2014-02-17T12:15:01Z
dc.date.available2014-05-01T02:02:40Z
dc.date.issued2013-11
dc.identifier.citationHammer, H., Bratterud, A. & Fagernes, S. (2013). Crawling JavaScript websites using WebKit - with application to analysis of hate speech in online discussions. NIK: Norsk Informatikkonferanse. Trondheim: Tapir Akademisken_US
dc.identifier.issn1892-0713
dc.identifier.otherFRIDAID 1069386
dc.identifier.urihttps://hdl.handle.net/10642/1834
dc.description.abstractJavaScript Client-side hidden web pages (CSHW) contain dynamic material created as a result of specific user activities. The number of CSHW websites is increasing. Crawling the so-called Hidden Web is challenging, particularly when JavaScript CSHW from an external website is seamlessly included as part of the web pages. We have developed a prototype web crawler that efficiently extracts content from CSHW. The crawler uses WebKit to render web pages and to emulate human web page activities to reveal dynamic content. The WebKit crawler was used to collect text from 39 Norwegian online newspaper debate articles, where the online user discussions were included as JavaScript CSHW from other websites. The average speed to extract the main content and the JavaScript-generated discussions were 36.3 kB/sec and 8.8 kB/sec, respectively. Analyzing the collected text from the news paper debate articles using opinion mining, documents that the debate articles are more positive to Islam and Muslims than the following discussions. The results demonstrate the importance of being able to collect such JavaScript CSHW discussion content to get an overview of existing hate speech on the Interneten_US
dc.language.isoengen_US
dc.publisherTapir Akademisk Forlagen_US
dc.relation.ispartofseriesNIK: Norsk Informatikkonferanse;2013
dc.subjectJavascripten_US
dc.subjectWebkiten_US
dc.subjectOnline discussionsen_US
dc.subjectHate speechen_US
dc.subjectWebpagesen_US
dc.subjectVDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Teoretisk databehandling, programmeringsspråk og -teori: 421en_US
dc.subjectDataanalysisen_US
dc.titleCrawling JavaScript websites using WebKit - with application to analysis of hate speech in online discussionsen_US
dc.typeJournal articleen_US
dc.typePeer revieweden_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel