Testing the calibration of classification models from first principles

Stephan Dreiseitl, Melanie Osl

Publikation: Beitrag in Buch/Bericht/TagungsbandKonferenzbeitragBegutachtung

8 Zitate (Scopus)


The accurate assessment of the calibration of classification models is severely limited by the fact that there is no easily available gold standard against which to compare a model's outputs. The usual procedures group expected and observed probabilities, and then perform a χ(2) goodness-of-fit test. We propose an entirely new approach to calibration testing that can be derived directly from the first principles of statistical hypothesis testing. The null hypothesis is that the model outputs are correct, i.e., that they are good estimates of the true unknown class membership probabilities. Our test calculates a p-value by checking how (im)probable the observed class labels are under the null hypothesis. We demonstrate by experiments that our proposed test performs comparable to, and sometimes even better than, the Hosmer-Lemeshow goodness-of-fit test, the de facto standard in calibration assessment.

TitelProceedings of the AMIA Annual Fall Symposium 2012
PublikationsstatusVeröffentlicht - 2012
VeranstaltungAMIA Annual Fall Symposium 2012 - Chicago, IL., USA/Vereinigte Staaten
Dauer: 3 Nov 20127 Nov 2012


KonferenzAMIA Annual Fall Symposium 2012
Land/GebietUSA/Vereinigte Staaten
OrtChicago, IL.


Untersuchen Sie die Forschungsthemen von „Testing the calibration of classification models from first principles“. Zusammen bilden sie einen einzigartigen Fingerprint.