Automating Data Quality Monitoring with Reference Data Profiles

Lisa Ehrlinger, Bernhard Werth, Wolfram Wöß

Research output: Chapter in Book/Report/Conference proceedingsConference contributionpeer-review

Abstract

Data quality is of central importance for the qualitative evaluation of decisions taken by AI-based applications. In practice, data from several heterogeneous data sources is integrated, but complete, global domain knowledge is often not available. In such heterogeneous scenarios, it is particularly difficult to monitor data quality (e.g., completeness, accuracy, timeliness) over time. In this paper, we formally introduce a new data-centric method for automated data quality monitoring, which is based on reference data profiles. A reference data profile is a set of data profiling statistics that is learned automatically to model the target quality of the data. In contrast to most existing data quality approaches that require domain experts to define rules, our method can be fully automated from initialization to continuous monitoring. This data-centric method has been implemented in our data quality tool DQ-MeeRKat and evaluated with six real-world telematic device data streams.

Original languageEnglish
Title of host publicationData Management Technologies and Applications - 10th International Conference, DATA 2021, and 11th International Conference, DATA 2022, Revised Selected Papers
EditorsAlfredo Cuzzocrea, Oleg Gusikhin, Slimane Hammoudi, Christoph Quix
PublisherSpringer
Pages24-44
Number of pages21
ISBN (Print)9783031378898
DOIs
Publication statusPublished - 2023
EventProceedings of the 10th International Conference and 11th International Conference on Data Management Technologies and Applications, DATA 2021 and DATA 2022 - Lisbon, Portugal
Duration: 11 Jul 202213 Jul 2022

Publication series

NameCommunications in Computer and Information Science
Volume1860 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

ConferenceProceedings of the 10th International Conference and 11th International Conference on Data Management Technologies and Applications, DATA 2021 and DATA 2022
Country/TerritoryPortugal
CityLisbon
Period11.07.202213.07.2022

Keywords

  • Automated quality checks
  • Data quality monitoring
  • Knowledge graphs
  • Reference data profiles

Fingerprint

Dive into the research topics of 'Automating Data Quality Monitoring with Reference Data Profiles'. Together they form a unique fingerprint.

Cite this