Chrontext: Portable SPARQL queries over contextualised time series data in industrial settings

Bakken, Magnus; Soylu, Ahmet

dc.contributor.author	Bakken, Magnus
dc.contributor.author	Soylu, Ahmet
dc.date.accessioned	2023-10-25T06:29:40Z
dc.date.available	2023-10-25T06:29:40Z
dc.date.created	2023-05-12T14:16:49Z
dc.date.issued	2023
dc.identifier.citation	Expert Systems With Applications. 2023, 226 .	en_US
dc.identifier.issn	0957-4174
dc.identifier.uri	https://hdl.handle.net/11250/3098550
dc.description.abstract	Industrial information models are standardised ways of representing industrial devices, equipment, and processes together with the data collected from associated sensors and control systems. Companies invest in such models to enable digitalisation and modular, reusable solutions. They also invest heavily in analytics (e.g. machine learning) based on time series data sets to improve operations. Queries that use such context to retrieve time series data can make industrial data sets more accessible to practitioners performing analytics and application development. Moreover, they can enable scalable deployment of resulting analytical models. Industrial availability constraints require that queries over context and time series should be portable in general, as they should be able retrieve data for training in a cloud setting and production data for deployment in an on-premise setting. Solving this problem is challenging with existing approaches as context and time series data tend to exist in separate, specialised databases. We address the issue by proposing a hybrid query engine, namely Chrontext, in the setting of a SPARQL database hosting the static model, and an arbitrary time series database. We show how with a set of annotations in the knowledge graph, SPARQL queries can be evaluated over such a hybrid architecture. We provide a proof showing that our approach correctly answers SPARQL 1.1. queries. We implement our approach in Rust under the Apache 2.0 license, and use the Apache Arrow-based Polars library together with configurable pushdowns to achieve high performance. We compare the performance of Chrontext against Ontop, one of the most prominent virtual knowledge graph systems, on a synthetic data set based on industrial standards. Data are stored in a S3 data lake and PostgreSQL with the Dremio data lakehouse as the SQL integrator. We find that our approach performs 10× to 85× faster and consumes much less memory than Ontop.	en_US
dc.language.iso	eng	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Chrontext: Portable SPARQL queries over contextualised time series data in industrial settings	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	2
dc.identifier.doi	10.1016/j.eswa.2023.120149
dc.identifier.cristin	2147194
dc.source.journal	Expert Systems With Applications	en_US
dc.source.volume	226	en_US
dc.source.pagenumber	30	en_US
dc.relation.project	Norges forskningsråd: 316656	en_US

Tilhørende fil(er)

Filnavn:: 1-s2.0-S0957417423006516-main.pdf
Størrelse:: 3.605Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Publikasjoner fra Cristin [3256]
TKD - Institutt for informasjonsteknologi [940]
TKD - Department of Computer Science

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal