Process pruner: A tool for sequence-based event log preprocessing

David Baumgartner, Andreas Haghofer, Martin Limberger, Emmanuel Helm

Research output: Contribution to journalConference articlepeer-review


A major challenge in applying process mining on real event data is the presence of noisy or incomplete cases or unusual behaviors. Applying process mining on raw event data leads to wrong conclusions during the discovery of process models, concealing the typical behavior. In this paper, an alternative for filtering event data without the need for extensive preprocessing is presented. The method is based on generated footprint matrices of randomly pruned sub-logs and works in a semi-automated manner. By identifying the most similar matrices to validate the whole log, traces representing unusual behavior can be excluded or highlighted. The tool was implemented with Python 3, NumPy and Pandas and is publicly available on GitHub. We evaluated our tool using benchmark data-sets and compared it to human filtering and discovery results.

Original languageEnglish
Pages (from-to)1-4
Number of pages4
JournalCEUR Workshop Proceedings
Publication statusPublished - 2019
EventICPM Demo Track 2019, ICPM Demo Track 2019 - Aachen, Germany
Duration: 24 Jun 201926 Jun 2019


  • Data mining
  • Preprocessing
  • Process mining

Cite this