Automating Data Quality Monitoring with Reference Data Profiles

Lisa Ehrlinger, Bernhard Werth, Wolfram Wöß

Publikation: Beitrag in Buch/Bericht/TagungsbandKonferenzbeitragBegutachtung

Abstract

Data quality is of central importance for the qualitative evaluation of decisions taken by AI-based applications. In practice, data from several heterogeneous data sources is integrated, but complete, global domain knowledge is often not available. In such heterogeneous scenarios, it is particularly difficult to monitor data quality (e.g., completeness, accuracy, timeliness) over time. In this paper, we formally introduce a new data-centric method for automated data quality monitoring, which is based on reference data profiles. A reference data profile is a set of data profiling statistics that is learned automatically to model the target quality of the data. In contrast to most existing data quality approaches that require domain experts to define rules, our method can be fully automated from initialization to continuous monitoring. This data-centric method has been implemented in our data quality tool DQ-MeeRKat and evaluated with six real-world telematic device data streams.

OriginalspracheEnglisch
TitelData Management Technologies and Applications - 10th International Conference, DATA 2021, and 11th International Conference, DATA 2022, Revised Selected Papers
Redakteure/-innenAlfredo Cuzzocrea, Oleg Gusikhin, Slimane Hammoudi, Christoph Quix
Herausgeber (Verlag)Springer
Seiten24-44
Seitenumfang21
ISBN (Print)9783031378898
DOIs
PublikationsstatusVeröffentlicht - 2023
VeranstaltungProceedings of the 10th International Conference and 11th International Conference on Data Management Technologies and Applications, DATA 2021 and DATA 2022 - Lisbon, Portugal
Dauer: 11 Juli 202213 Juli 2022

Publikationsreihe

NameCommunications in Computer and Information Science
Band1860 CCIS
ISSN (Print)1865-0929
ISSN (elektronisch)1865-0937

Konferenz

KonferenzProceedings of the 10th International Conference and 11th International Conference on Data Management Technologies and Applications, DATA 2021 and DATA 2022
Land/GebietPortugal
OrtLisbon
Zeitraum11.07.202213.07.2022

Fingerprint

Untersuchen Sie die Forschungsthemen von „Automating Data Quality Monitoring with Reference Data Profiles“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren