Data Analysis and Predictive Modeling of Steel Quality Based on Casting Parameters in Continuous Casting

  • Emma Kiemeyer

    Student thesis: Master's Thesis

    Abstract

    Steel casting transforms molten steel into solid slabs, with various casting parameters
    recorded for maintenance and monitoring. Quality control ensures the steel meets standards and links defects to casting parameters. This Master’s Thesis examines how data
    engineering and data science, including Artificial Intelligence (AI) and machine learning, can improve understanding of the relationship between casting parameters and steel
    quality. It explores feature extraction and preprocessing methods to enhance machine
    learning models and uses explainable AI to analyze how these features affect model
    predictions.
    This thesis investigates various data structuring strategies, model performance, and
    feature importance in predicting steel defects. Clustering methods were ineffective due
    to class imbalance and small sample sizes, which hindered meaningful analysis. Analysis
    of raw data revealed balanced recall and accuracy but struggled with low precision, a
    challenge that persisted even when using aggregated data. Quality shifting attempts
    did not enhance performance, underscoring the need for more precise data handling. In
    contrast, the results of advanced data aggregation techniques were significantly more
    meaningful. Higher levels of data aggregation markedly improved performance, with the
    F1 score reaching up to 0.56 and recall up to 0.65, demonstrating the importance of
    aggregation for balancing recall and precision. Furthermore, models trained on significant samples showed notable improvements in accuracy and recall, while models using
    all samples have a higher precision. These findings highlight the critical role of effective
    data structuring in achieving better predictive performance and underscore the impact
    of aggregation levels on model outcomes.
    Furthermore, feature extraction methods, such as ANOVA, VSURF, and LASSO,
    played a crucial role in reducing the number of features to between 5 and 12, significantly enhancing model performance in terms of the F1 score. In particular, the Random
    Forest and Bernoulli Naive Bayes models performed best, as the F1 score shows, accurately detecting about two-thirds of the defects and demonstrating effectiveness through
    high True Positive (TP) values relative to the actual number of defects. Explainable AI
    methods revealed that steel grade is a significant factor, with lower carbon grades linked
    to higher defect rates. Despite moderate overall model performance, these methods provided valuable insights into feature importance and defect prediction, offering guidance
    for improving defect prediction and identifying areas for future research.
    Date of Award2024
    Original languageEnglish (American)
    SupervisorStephan Winkler (Supervisor) & Sonja Strasser (Supervisor)

    Cite this

    '