TY - GEN

T1 - Effects of data grouping on calibration measures of classifiers

AU - Dreiseitl, Stephan

AU - Osl, Melanie

PY - 2012

Y1 - 2012

N2 - The calibration of a probabilistic classifier refers to the extend to which its probability estimates match the true class membership probabilities. Measuring the calibration of a classifier usually relies on performing chi-squared goodness-of-fit tests between grouped probabilities and the observations in these groups. We considered alternatives to the Hosmer-Lemeshow test, the standard chi-squared test with groups based on sorted model outputs. Since this grouping does not represent "natural" groupings in data space, we investigated a chi-squared test with grouping strategies in data space. Using a series of artificial data sets for which the correct models are known, and one real-world data set, we analyzed the performance of the Pigeon-Heyse test with groupings by self-organizing maps, k-means clustering, and random assignment of points to groups. We observed that the Pigeon-Heyse test offers slightly better performance than the Hosmer-Lemeshow test while being able to locate regions of poor calibration in data space.

AB - The calibration of a probabilistic classifier refers to the extend to which its probability estimates match the true class membership probabilities. Measuring the calibration of a classifier usually relies on performing chi-squared goodness-of-fit tests between grouped probabilities and the observations in these groups. We considered alternatives to the Hosmer-Lemeshow test, the standard chi-squared test with groups based on sorted model outputs. Since this grouping does not represent "natural" groupings in data space, we investigated a chi-squared test with grouping strategies in data space. Using a series of artificial data sets for which the correct models are known, and one real-world data set, we analyzed the performance of the Pigeon-Heyse test with groupings by self-organizing maps, k-means clustering, and random assignment of points to groups. We observed that the Pigeon-Heyse test offers slightly better performance than the Hosmer-Lemeshow test while being able to locate regions of poor calibration in data space.

KW - Classifier calibration

KW - Hosmer-Lemeshow test

KW - Pigeon-Heyse test

KW - goodness-of-fit tests

UR - http://www.scopus.com/inward/record.url?scp=84856910538&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-27549-4_46

DO - 10.1007/978-3-642-27549-4_46

M3 - Conference contribution

SN - 9783642275487

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 359

EP - 366

BT - Computer Aided Systems Theory, EUROCAST 2011 - 13th International Conference, Revised Selected Papers

T2 - 13th International Conference on Computer Aided Systems Theory EUROCAST 2011

Y2 - 6 February 2011 through 11 February 2011

ER -