Dimensionsreduktion industrieller Anlagedaten für Process Mining

  • Jonathan Schmalzer

    Student thesis: Master's Thesis

    Abstract

    The increasing digitalization of industrial production systems generates ever-growing volumes of data. In modular bending machines, variable process sequences and extensive sensor data lead to high-dimensional datasets. While these datasets contain valuable information about process execution and conditions, they are difficult to apply process mining without prior preprocessing. Lack of structure, redundancy, and technical constraints hinder the derivation of reliable insights. Against this background, this thesis investigates how methods of complexity and dimensionality reduction can improve the data basis and en-hance the knowledge gained from process mining. The research follows a three-step approach. First, the theoretical foundations of process mining, data quality, and dimensionality reduction are outlined. Second, a scientific transfer is conducted by extracting real production data from a MongoDB database, preprocessing it in KNIME, and integrating it into the process mining platform Celonis. The analysis focuses on five approaches: Low-Variance Filter, Correlation Filter, Principal Component Analysis (PCA), Autoencoder, and a combined method of selection and extraction. Evaluation is based on defined metrics, including the number of remaining parameters, reduction rates, process variant coverage, the p-n factor, and interpretability. Last, the results were as-sessed both quantitatively and qualitatively and visualized through interactive dashboards in Celonis. The findings reveal clear differences between the methods. Feature selection proved robust, technically stable, and fully interpretable. Both the Low-Variance and Correlation Filters reduced parameters by about two-thirds without losing instances or variants. The p-n factor dropped to 0.01 %, ensuring structural simplification while preserving transparency. Feature extraction methods achieved stronger compression but came with significant drawbacks. PCA reduced the number of parameters but was limited to 500,000 instances due to inversion constraints. The Autoencoder captured nonlinear structures but required long training times, caused major instance losses, and produced variables with limited interpretability. The combined method, by applying feature selection before PCA, achieved the strongest results.by reducing parameters by 79 % with full instance and variant coverage, alongside an explained variance of 84.6 %.
    Date of Award2025
    Original languageGerman (Austria)
    SupervisorSonja Straßer (Supervisor)

    Studyprogram

    • Operations Management

    Cite this

    '