Effects of data grouping on calibration measures of classifiers

Stephan Dreiseitl, Melanie Osl

Research output: Chapter in Book/Report/Conference proceedingsConference contributionpeer-review

1 Citation (Scopus)


The calibration of a probabilistic classifier refers to the extend to which its probability estimates match the true class membership probabilities. Measuring the calibration of a classifier usually relies on performing chi-squared goodness-of-fit tests between grouped probabilities and the observations in these groups. We considered alternatives to the Hosmer-Lemeshow test, the standard chi-squared test with groups based on sorted model outputs. Since this grouping does not represent "natural" groupings in data space, we investigated a chi-squared test with grouping strategies in data space. Using a series of artificial data sets for which the correct models are known, and one real-world data set, we analyzed the performance of the Pigeon-Heyse test with groupings by self-organizing maps, k-means clustering, and random assignment of points to groups. We observed that the Pigeon-Heyse test offers slightly better performance than the Hosmer-Lemeshow test while being able to locate regions of poor calibration in data space.

Original languageEnglish
Title of host publicationComputer Aided Systems Theory, EUROCAST 2011 - 13th International Conference, Revised Selected Papers
Number of pages8
EditionPART 1
Publication statusPublished - 2012
Event13th International Conference on Computer Aided Systems Theory EUROCAST 2011 - Las Palmas, Spain
Duration: 6 Feb 201111 Feb 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume6927 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference13th International Conference on Computer Aided Systems Theory EUROCAST 2011
CityLas Palmas


  • Classifier calibration
  • Hosmer-Lemeshow test
  • Pigeon-Heyse test
  • goodness-of-fit tests


Dive into the research topics of 'Effects of data grouping on calibration measures of classifiers'. Together they form a unique fingerprint.

Cite this