• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Sentraladministrasjonen
  • Publikasjoner fra Cristin
  • View Item
  •   Home
  • Sentraladministrasjonen
  • Publikasjoner fra Cristin
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A Reference Data Model to Specify Event Logs for Big Data Pipeline Discovery

Benvenuti, Dario; Marrella, Andrea; Rossi, Jacopo; Nikolov, Nikolay Vladimirov; Roman, Dumitru; Soylu, Ahmet; Perales, Fernando
Chapter, Peer reviewed, Conference object, Journal article
Accepted version
Thumbnail
View/Open
DataCloud_BPM23_referencemodel.pdf (738.0Kb)
URI
https://hdl.handle.net/11250/3109871
Date
2023
Metadata
Show full item record
Collections
  • Publikasjoner fra Cristin [4167]
  • TKD - Institutt for informasjonsteknologi [1040]
Original version
Lecture Notes in Business Information Processing. 2023, 490 38-54.   https://doi.org/10.1007/978-3-031-41623-1_3
Abstract
State-of-the-art approaches for managing Big Data pipelines assume their anatomy is known by design and expressed through adhoc Domain-Specific Languages (DSLs), with insufficient knowledge of the dark data involved in the pipeline execution. Dark data is data that organizations acquire during regular business activities but is not used to derive insights or for decision-making. The recent literature on Big Data processing agrees that a new breed of Big Data pipeline discovery (BDPD) solutions can mitigate this issue by solely analyzing the event log that keeps track of pipeline executions over time. Relying on well-established process mining techniques, BDPD can reveal fact-based insights into how data pipelines transpire and access dark data. However, to date, a standard format to specify the concept of Big Data pipeline execution in an event log does not exist, making it challenging to apply process mining to achieve the BDPD task. To address this issue, in this paper we formalize a universally applicable reference data model to conceptualize the core properties and attributes of a data pipeline execution. We provide an implementation of the model as an extension to the XES interchange standard for event logs, demonstrate its practical applicability in a use case involving a data pipeline for managing digital marketing campaigns, and evaluate its effectiveness in uncovering dark data manipulated during several pipeline executions.
Publisher
Springer
Series
Lecture Notes in Business Information Processing;
Journal
Lecture Notes in Business Information Processing

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit