A Probabilistic Transformation of Distance-Based Outliers

David Muhr, Michael Affenzeller, Josef Küng

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)

Abstract

The scores of distance-based outlier detection methods are difficult to interpret, and it is challenging to determine a suitable cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Over a variety of tabular and image benchmark datasets, we show that the probabilistic transformation does not impact outlier ranking (ROC AUC) or detection performance (AP, F1), and increases the contrast between normal and outlier score distributions (statistical distance). The experimental findings indicate that it is possible to transform distance-based outlier scores into interpretable probabilities with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and, because existing distance computations are used, it adds no significant computational overhead.

Original languageEnglish
Pages (from-to)782-802
Number of pages21
JournalMachine Learning and Knowledge Extraction
Volume5
Issue number3
DOIs
Publication statusPublished - Sept 2023

Keywords

  • anomaly detection
  • anomaly scores
  • distance distribution
  • novelty detection
  • outlier detection
  • outlier probabilities
  • outlier scores
  • score contrast
  • score distribution
  • score normalization

Fingerprint

Dive into the research topics of 'A Probabilistic Transformation of Distance-Based Outliers'. Together they form a unique fingerprint.

Cite this