TY - GEN
T1 - Empirical analysis of variance for genetic programming based symbolic regression
AU - Kammerer, Lukas
AU - Kronberger, Gabriel
AU - Winkler, Stephan
N1 - Funding Information:
The authors gratefully acknowledge support by the Christian Doppler Research Association and the Federal Ministry of Digital and Economic Affairs within the Josef Ressel Center for Symbolic Regression
Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/7/7
Y1 - 2021/7/7
N2 - Genetic programming (GP) based symbolic regression is a stochastic, high-variance algorithm. Its sensitivity to changes in training data is a drawback for practical applications. In this work, we analyze empirically the variance of GP models on the PennML benchmarks. We measure the spread of model predictions when models are trained on slightly perturbed data. We compare the spread of models from two GP variants as well as linear, polynomial and random forest regression models. The results show that the spread of models from GP with local optimization is significantly higher than that of all other algorithms. As a side effect of our analysis, we provide evidence that the PennML benchmark contains two groups of instances (Friedman and real-world problem instances) for which GP performs significantly different.
AB - Genetic programming (GP) based symbolic regression is a stochastic, high-variance algorithm. Its sensitivity to changes in training data is a drawback for practical applications. In this work, we analyze empirically the variance of GP models on the PennML benchmarks. We measure the spread of model predictions when models are trained on slightly perturbed data. We compare the spread of models from two GP variants as well as linear, polynomial and random forest regression models. The results show that the spread of models from GP with local optimization is significantly higher than that of all other algorithms. As a side effect of our analysis, we provide evidence that the PennML benchmark contains two groups of instances (Friedman and real-world problem instances) for which GP performs significantly different.
KW - bias/variance tradeoff
KW - genetic programming
KW - symbolic regression
UR - http://www.scopus.com/inward/record.url?scp=85111021887&partnerID=8YFLogxK
U2 - 10.1145/3449726.3459486
DO - 10.1145/3449726.3459486
M3 - Conference contribution
AN - SCOPUS:85111021887
T3 - GECCO 2021 Companion - Proceedings of the 2021 Genetic and Evolutionary Computation Conference Companion
SP - 251
EP - 252
BT - GECCO 2021 Companion - Proceedings of the 2021 Genetic and Evolutionary Computation Conference Companion
PB - Association for Computing Machinery, Inc
T2 - 2021 Genetic and Evolutionary Computation Conference, GECCO 2021
Y2 - 10 July 2021 through 14 July 2021
ER -