A major challenge in applying process mining on real event data is the presence of noisy or incomplete cases or unusual behaviors. Applying process mining on raw event data leads to wrong conclusions during the discovery of process models, concealing the typical behavior. In this paper, an alternative for filtering event data without the need for extensive preprocessing is presented. The method is based on generated footprint matrices of randomly pruned sub-logs and works in a semi-automated manner. By identifying the most similar matrices to validate the whole log, traces representing unusual behavior can be excluded or highlighted. The tool was implemented with Python 3, NumPy and Pandas and is publicly available on GitHub. We evaluated our tool using benchmark data-sets and compared it to human filtering and discovery results.
- Data mining
- Process mining