TY - JOUR
T1 - Testing noisy numerical data for monotonic association
AU - Bodenhofer, Ulrich
AU - Krone, Martin
AU - Klawonn, Frank
PY - 2013/10/1
Y1 - 2013/10/1
N2 - Rank correlation measures are intended to measure to which extent there is a monotonic association between two observables. While they are mainly designed for ordinal data, they are not ideally suited for noisy numerical data. In order to better account for noisy data, a family of rank correlation measures has previously been introduced that replaces classical ordering relations by fuzzy relations with smooth transitions - thereby ensuring that the correlation measure is continuous with respect to the data. The given paper briefly repeats the basic concepts behind this family of rank correlation measures and investigates it from the viewpoint of robust statistics. Then, on this basis, we introduce a framework of novel rank correlation tests. An extensive experimental evaluation using a large number of simulated data sets is presented which demonstrates that the new tests indeed outperform the classical variants in terms of type II error rates without sacrificing good performance in terms of type I error rates. This is mainly due to the fact that the new tests are more robust to noise for small samples. The Gaussian rank correlation estimator turned out to be the best choice in situations where no prior knowledge is available about the data, whereas the new family of robust gamma test provides an advantage in situations where information about the noise distribution is available. An implementation of all robust rank correlation tests used in this paper is available as an R package from the CRAN repository.
AB - Rank correlation measures are intended to measure to which extent there is a monotonic association between two observables. While they are mainly designed for ordinal data, they are not ideally suited for noisy numerical data. In order to better account for noisy data, a family of rank correlation measures has previously been introduced that replaces classical ordering relations by fuzzy relations with smooth transitions - thereby ensuring that the correlation measure is continuous with respect to the data. The given paper briefly repeats the basic concepts behind this family of rank correlation measures and investigates it from the viewpoint of robust statistics. Then, on this basis, we introduce a framework of novel rank correlation tests. An extensive experimental evaluation using a large number of simulated data sets is presented which demonstrates that the new tests indeed outperform the classical variants in terms of type II error rates without sacrificing good performance in terms of type I error rates. This is mainly due to the fact that the new tests are more robust to noise for small samples. The Gaussian rank correlation estimator turned out to be the best choice in situations where no prior knowledge is available about the data, whereas the new family of robust gamma test provides an advantage in situations where information about the noise distribution is available. An implementation of all robust rank correlation tests used in this paper is available as an R package from the CRAN repository.
KW - Fuzzy ordering
KW - Gamma correlation coefficient
KW - R package rococo
KW - Rank correlation
KW - Rank correlation test
KW - Robust statistics
UR - http://www.scopus.com/inward/record.url?scp=84880265979&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2012.11.026
DO - 10.1016/j.ins.2012.11.026
M3 - Article
AN - SCOPUS:84880265979
SN - 0020-0255
VL - 245
SP - 21
EP - 37
JO - Information Sciences
JF - Information Sciences
ER -