The constantly growing volume of data is increasingly presenting companies with the challenge of analyzing the resulting complex data sets efficiently and preparing them in an understandable form for decision-makers. In many cases, the high dimensionality of the data not only makes computational processing difficult, but also human interpretation. For this reason, this master thesis aims to evaluate suitable methods for reducing the complexity of multivariate company data and to derive well-founded recommendations for practical application. The focus is on methods from the unsupervised field of machine learning, which reduce complexity and enable the data to be presented in an understandable way. After introducing basic terms such as big data, multivariate data analysis and company data, this paper explains established processes for knowledge acquisition such as KDD, CRISP-DM and SEMMA. This is followed by a comprehensive presentation of relevant methods for complexity reduction, divided into feature selection, dimension reduction and clustering. The methods considered include Variance Thresholding, Correlation-Based Filtering, PCA, t-SNE, UMAP as well as K-Means and Agglomerative Hierarchical. The theoretical analysis is supplemented by an empirical transfer, which presents the application of previously evaluated methods in practice. In the transfer section, the selected methods are applied to two sample data sets using the KNIME data analytics software and evaluated in terms of their accuracy, visualizability and comprehensibility. The results show that the combination of dimension reduction and clustering, in particular PCA + KMeans and t-SNE + K-Means, leads to a significant reduction in the complexity of the data. While PCA convinces with interpretable principal components and achieves an accuracy of up to 94.9%, tSNE provides particularly clear visual cluster separations with an equally high accuracy of up to 92.7%. The correlation-based filter also proves to be a practicable approach for reducing redundant features while maintaining high interpretability. Overall, it is clear that the choice of suitable methods depends heavily on the analysis objective. Interpretability and visualizability are often in conflict with each other. This work therefore provides practical recommendations for reducing the complexity of data from the business environment.
| Date of Award | 2025 |
|---|
| Original language | German (Austria) |
|---|
| Supervisor | Sonja Straßer (Supervisor) |
|---|
Evaluierung von Methoden zur Komplexitätsreduktion von multivariaten Unternehmensdaten
Hader, D. (Author). 2025
Student thesis: Master's Thesis