A Probabilistic Transformation of Distance-Based Outliers

David Muhr, Michael Affenzeller, Josef Küng

Publikation: Beitrag in FachzeitschriftArtikelBegutachtung

10 Zitate (Scopus)

Abstract

The scores of distance-based outlier detection methods are difficult to interpret, and it is challenging to determine a suitable cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Over a variety of tabular and image benchmark datasets, we show that the probabilistic transformation does not impact outlier ranking (ROC AUC) or detection performance (AP, F1), and increases the contrast between normal and outlier score distributions (statistical distance). The experimental findings indicate that it is possible to transform distance-based outlier scores into interpretable probabilities with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and, because existing distance computations are used, it adds no significant computational overhead.

OriginalspracheEnglisch
Seiten (von - bis)782-802
Seitenumfang21
FachzeitschriftMachine Learning and Knowledge Extraction
Jahrgang5
Ausgabenummer3
DOIs
PublikationsstatusVeröffentlicht - Sep. 2023

Fingerprint

Untersuchen Sie die Forschungsthemen von „A Probabilistic Transformation of Distance-Based Outliers“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren