TY - JOUR
T1 - A Probabilistic Transformation of Distance-Based Outliers
AU - Muhr, David
AU - Affenzeller, Michael
AU - Küng, Josef
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/9
Y1 - 2023/9
N2 - The scores of distance-based outlier detection methods are difficult to interpret, and it is challenging to determine a suitable cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Over a variety of tabular and image benchmark datasets, we show that the probabilistic transformation does not impact outlier ranking (ROC AUC) or detection performance (AP, F1), and increases the contrast between normal and outlier score distributions (statistical distance). The experimental findings indicate that it is possible to transform distance-based outlier scores into interpretable probabilities with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and, because existing distance computations are used, it adds no significant computational overhead.
AB - The scores of distance-based outlier detection methods are difficult to interpret, and it is challenging to determine a suitable cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Over a variety of tabular and image benchmark datasets, we show that the probabilistic transformation does not impact outlier ranking (ROC AUC) or detection performance (AP, F1), and increases the contrast between normal and outlier score distributions (statistical distance). The experimental findings indicate that it is possible to transform distance-based outlier scores into interpretable probabilities with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and, because existing distance computations are used, it adds no significant computational overhead.
KW - anomaly detection
KW - anomaly scores
KW - distance distribution
KW - novelty detection
KW - outlier detection
KW - outlier probabilities
KW - outlier scores
KW - score contrast
KW - score distribution
KW - score normalization
UR - http://www.scopus.com/inward/record.url?scp=85172209110&partnerID=8YFLogxK
U2 - 10.3390/make5030042
DO - 10.3390/make5030042
M3 - Article
AN - SCOPUS:85172209110
SN - 2504-4990
VL - 5
SP - 782
EP - 802
JO - Machine Learning and Knowledge Extraction
JF - Machine Learning and Knowledge Extraction
IS - 3
ER -