TY - JOUR
T1 - Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing
AU - Tripathi, Shailesh
AU - Muhr, David
AU - Brunner, Manuel
AU - Jodlbauer, Herbert
AU - Dehmer, Matthias
AU - Emmert-Streib, Frank
N1 - Publisher Copyright:
© Copyright © 2021 Tripathi, Muhr, Brunner, Jodlbauer, Dehmer and Emmert-Streib.
PY - 2021/6/14
Y1 - 2021/6/14
N2 - The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data- and model development-related issues. These issues need to be carefully addressed by allowing a flexible, customized and industry-specific knowledge discovery framework. For this reason, extensions of CRISP-DM are needed. In this paper, we provide a detailed review of CRISP-DM and summarize extensions of this model into a novel framework we call Generalized Cross-Industry Standard Process for Data Science (GCRISP-DS). This framework is designed to allow dynamic interactions between different phases to adequately address data- and model-related issues for achieving robustness. Furthermore, it emphasizes also the need for a detailed business understanding and the interdependencies with the developed models and data quality for fulfilling higher business objectives. Overall, such a customizable GCRISP-DS framework provides an enhancement for model improvements and reusability by minimizing robustness-issues.
AB - The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data- and model development-related issues. These issues need to be carefully addressed by allowing a flexible, customized and industry-specific knowledge discovery framework. For this reason, extensions of CRISP-DM are needed. In this paper, we provide a detailed review of CRISP-DM and summarize extensions of this model into a novel framework we call Generalized Cross-Industry Standard Process for Data Science (GCRISP-DS). This framework is designed to allow dynamic interactions between different phases to adequately address data- and model-related issues for achieving robustness. Furthermore, it emphasizes also the need for a detailed business understanding and the interdependencies with the developed models and data quality for fulfilling higher business objectives. Overall, such a customizable GCRISP-DS framework provides an enhancement for model improvements and reusability by minimizing robustness-issues.
KW - CRISP- DM
KW - industrial production
KW - industry 4.0
KW - machine learning
KW - robustness
KW - smart manufacturing
UR - http://www.scopus.com/inward/record.url?scp=85113605181&partnerID=8YFLogxK
U2 - 10.3389/frai.2021.576892
DO - 10.3389/frai.2021.576892
M3 - Review article
C2 - 34195608
AN - SCOPUS:85113605181
VL - 4
JO - Frontiers in Artificial Intelligence
JF - Frontiers in Artificial Intelligence
M1 - 576892
ER -