TY - GEN
T1 - Automating Data Quality Monitoring with Reference Data Profiles
AU - Ehrlinger, Lisa
AU - Werth, Bernhard
AU - Wöß, Wolfram
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Data quality is of central importance for the qualitative evaluation of decisions taken by AI-based applications. In practice, data from several heterogeneous data sources is integrated, but complete, global domain knowledge is often not available. In such heterogeneous scenarios, it is particularly difficult to monitor data quality (e.g., completeness, accuracy, timeliness) over time. In this paper, we formally introduce a new data-centric method for automated data quality monitoring, which is based on reference data profiles. A reference data profile is a set of data profiling statistics that is learned automatically to model the target quality of the data. In contrast to most existing data quality approaches that require domain experts to define rules, our method can be fully automated from initialization to continuous monitoring. This data-centric method has been implemented in our data quality tool DQ-MeeRKat and evaluated with six real-world telematic device data streams.
AB - Data quality is of central importance for the qualitative evaluation of decisions taken by AI-based applications. In practice, data from several heterogeneous data sources is integrated, but complete, global domain knowledge is often not available. In such heterogeneous scenarios, it is particularly difficult to monitor data quality (e.g., completeness, accuracy, timeliness) over time. In this paper, we formally introduce a new data-centric method for automated data quality monitoring, which is based on reference data profiles. A reference data profile is a set of data profiling statistics that is learned automatically to model the target quality of the data. In contrast to most existing data quality approaches that require domain experts to define rules, our method can be fully automated from initialization to continuous monitoring. This data-centric method has been implemented in our data quality tool DQ-MeeRKat and evaluated with six real-world telematic device data streams.
KW - Automated quality checks
KW - Data quality monitoring
KW - Knowledge graphs
KW - Reference data profiles
UR - http://www.scopus.com/inward/record.url?scp=85172682110&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-37890-4_2
DO - 10.1007/978-3-031-37890-4_2
M3 - Conference contribution
SN - 9783031378898
T3 - Communications in Computer and Information Science
SP - 24
EP - 44
BT - Data Management Technologies and Applications - 10th International Conference, DATA 2021, and 11th International Conference, DATA 2022, Revised Selected Papers
A2 - Cuzzocrea, Alfredo
A2 - Gusikhin, Oleg
A2 - Hammoudi, Slimane
A2 - Quix, Christoph
PB - Springer
T2 - Proceedings of the 10th International Conference and 11th International Conference on Data Management Technologies and Applications, DATA 2021 and DATA 2022
Y2 - 11 July 2022 through 13 July 2022
ER -