Empirical analysis of variance for genetic programming based symbolic regression

Research output: Chapter in Book/Report/Conference proceedingsConference contributionpeer-review

1 Citation (Scopus)

Abstract

Genetic programming (GP) based symbolic regression is a stochastic, high-variance algorithm. Its sensitivity to changes in training data is a drawback for practical applications. In this work, we analyze empirically the variance of GP models on the PennML benchmarks. We measure the spread of model predictions when models are trained on slightly perturbed data. We compare the spread of models from two GP variants as well as linear, polynomial and random forest regression models. The results show that the spread of models from GP with local optimization is significantly higher than that of all other algorithms. As a side effect of our analysis, we provide evidence that the PennML benchmark contains two groups of instances (Friedman and real-world problem instances) for which GP performs significantly different.

Original languageEnglish
Title of host publicationGECCO 2021 Companion - Proceedings of the 2021 Genetic and Evolutionary Computation Conference Companion
PublisherAssociation for Computing Machinery, Inc
Pages251-252
Number of pages2
ISBN (Electronic)9781450383516
DOIs
Publication statusPublished - 7 Jul 2021
Event2021 Genetic and Evolutionary Computation Conference, GECCO 2021 - Virtual, Online, France
Duration: 10 Jul 202114 Jul 2021

Publication series

NameGECCO 2021 Companion - Proceedings of the 2021 Genetic and Evolutionary Computation Conference Companion

Conference

Conference2021 Genetic and Evolutionary Computation Conference, GECCO 2021
Country/TerritoryFrance
CityVirtual, Online
Period10.07.202114.07.2021

Keywords

  • bias/variance tradeoff
  • genetic programming
  • symbolic regression

Fingerprint

Dive into the research topics of 'Empirical analysis of variance for genetic programming based symbolic regression'. Together they form a unique fingerprint.

Cite this