Vis enkel innførsel

dc.contributor.authorBenvenuti, Dario
dc.contributor.authorMarrella, Andrea
dc.contributor.authorRossi, Jacopo
dc.contributor.authorNikolov, Nikolay Vladimirov
dc.contributor.authorRoman, Dumitru
dc.contributor.authorSoylu, Ahmet
dc.contributor.authorPerales, Fernando
dc.date.accessioned2024-01-04T14:22:23Z
dc.date.available2024-01-04T14:22:23Z
dc.date.created2024-01-03T10:47:53Z
dc.date.issued2023
dc.identifier.citationLecture Notes in Business Information Processing. 2023, 490 38-54.en_US
dc.identifier.isbn978-3-031-41622-4
dc.identifier.isbn978-3-031-41623-1
dc.identifier.issn1865-1348
dc.identifier.issn1865-1356
dc.identifier.urihttps://hdl.handle.net/11250/3109871
dc.description.abstractState-of-the-art approaches for managing Big Data pipelines assume their anatomy is known by design and expressed through adhoc Domain-Specific Languages (DSLs), with insufficient knowledge of the dark data involved in the pipeline execution. Dark data is data that organizations acquire during regular business activities but is not used to derive insights or for decision-making. The recent literature on Big Data processing agrees that a new breed of Big Data pipeline discovery (BDPD) solutions can mitigate this issue by solely analyzing the event log that keeps track of pipeline executions over time. Relying on well-established process mining techniques, BDPD can reveal fact-based insights into how data pipelines transpire and access dark data. However, to date, a standard format to specify the concept of Big Data pipeline execution in an event log does not exist, making it challenging to apply process mining to achieve the BDPD task. To address this issue, in this paper we formalize a universally applicable reference data model to conceptualize the core properties and attributes of a data pipeline execution. We provide an implementation of the model as an extension to the XES interchange standard for event logs, demonstrate its practical applicability in a use case involving a data pipeline for managing digital marketing campaigns, and evaluate its effectiveness in uncovering dark data manipulated during several pipeline executions.en_US
dc.language.isoengen_US
dc.publisherSpringeren_US
dc.relation.ispartofseriesLecture Notes in Business Information Processing;
dc.titleA Reference Data Model to Specify Event Logs for Big Data Pipeline Discoveryen_US
dc.typeChapteren_US
dc.typePeer revieweden_US
dc.typeConference objecten_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode1
dc.identifier.doihttps://doi.org/10.1007/978-3-031-41623-1_3
dc.identifier.cristin2219685
dc.source.journalLecture Notes in Business Information Processingen_US
dc.source.volume490en_US
dc.source.pagenumber38-54en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel