Process pruner: A tool for sequence-based event log preprocessing

David Baumgartner, Andreas Haghofer, Martin Limberger, Emmanuel Helm

Publikation: Beitrag in FachzeitschriftKonferenzartikelBegutachtung

Abstract

A major challenge in applying process mining on real event data is the presence of noisy or incomplete cases or unusual behaviors. Applying process mining on raw event data leads to wrong conclusions during the discovery of process models, concealing the typical behavior. In this paper, an alternative for filtering event data without the need for extensive preprocessing is presented. The method is based on generated footprint matrices of randomly pruned sub-logs and works in a semi-automated manner. By identifying the most similar matrices to validate the whole log, traces representing unusual behavior can be excluded or highlighted. The tool was implemented with Python 3, NumPy and Pandas and is publicly available on GitHub. We evaluated our tool using benchmark data-sets and compared it to human filtering and discovery results.

OriginalspracheEnglisch
Seiten (von - bis)1-4
Seitenumfang4
FachzeitschriftCEUR Workshop Proceedings
Jahrgang2374
PublikationsstatusVeröffentlicht - 2019
VeranstaltungICPM Demo Track 2019, ICPM Demo Track 2019 - Aachen, Deutschland
Dauer: 24 Jun 201926 Jun 2019

Zitieren