Abstract
Steel casting transforms molten steel into solid slabs, with various casting parametersrecorded for maintenance and monitoring. Quality control ensures the steel meets standards and links defects to casting parameters. This Master’s Thesis examines how data
engineering and data science, including Artificial Intelligence (AI) and machine learning, can improve understanding of the relationship between casting parameters and steel
quality. It explores feature extraction and preprocessing methods to enhance machine
learning models and uses explainable AI to analyze how these features affect model
predictions.
This thesis investigates various data structuring strategies, model performance, and
feature importance in predicting steel defects. Clustering methods were ineffective due
to class imbalance and small sample sizes, which hindered meaningful analysis. Analysis
of raw data revealed balanced recall and accuracy but struggled with low precision, a
challenge that persisted even when using aggregated data. Quality shifting attempts
did not enhance performance, underscoring the need for more precise data handling. In
contrast, the results of advanced data aggregation techniques were significantly more
meaningful. Higher levels of data aggregation markedly improved performance, with the
F1 score reaching up to 0.56 and recall up to 0.65, demonstrating the importance of
aggregation for balancing recall and precision. Furthermore, models trained on significant samples showed notable improvements in accuracy and recall, while models using
all samples have a higher precision. These findings highlight the critical role of effective
data structuring in achieving better predictive performance and underscore the impact
of aggregation levels on model outcomes.
Furthermore, feature extraction methods, such as ANOVA, VSURF, and LASSO,
played a crucial role in reducing the number of features to between 5 and 12, significantly enhancing model performance in terms of the F1 score. In particular, the Random
Forest and Bernoulli Naive Bayes models performed best, as the F1 score shows, accurately detecting about two-thirds of the defects and demonstrating effectiveness through
high True Positive (TP) values relative to the actual number of defects. Explainable AI
methods revealed that steel grade is a significant factor, with lower carbon grades linked
to higher defect rates. Despite moderate overall model performance, these methods provided valuable insights into feature importance and defect prediction, offering guidance
for improving defect prediction and identifying areas for future research.
Date of Award | 2024 |
---|---|
Original language | English (American) |
Supervisor | Stephan Winkler (Supervisor) & Sonja Strasser (Supervisor) |