TY - JOUR
T1 - Shape-constrained Symbolic Regression - Improving Extrapolation with Prior Knowledge
AU - Kronberger, Gabriel
AU - França, Fabricio Olivetti de
AU - Burlacu, Bogdan
AU - Haider, Christian
AU - Kommenda, Michael
N1 - Publisher Copyright:
© 2021 Massachusetts Institute of Technology.
PY - 2022/3/1
Y1 - 2022/3/1
N2 - We investigate the addition of constraints on the function image and its derivatives for the incorporation of prior knowledge in symbolic regression. The approach is called shape-constrained symbolic regression and allows us to enforce, for example, mono-tonicity of the function over selected inputs. The aim is to find models which conform to expected behavior and which have improved extrapolation capabilities. We demonstrate the feasibility of the idea and propose and compare two evolutionary algorithms for shape-constrained symbolic regression: (i) an extension of tree-based genetic programming which discards infeasible solutions in the selection step, and (ii) a two-population evolutionary algorithm that separates the feasible from the infeasible solutions. In both algorithms we use interval arithmetic to approximate bounds for models and their partial derivatives. The algorithms are tested on a set of 19 synthetic and four real-world regression problems. Both algorithms are able to identify models which conform to shape constraints which is not the case for the unmodified symbolic regression algorithms. However, the predictive accuracy of models with constraints is worse on the training set and the test set. Shape-constrained polynomial regression produces the best results for the test set but also significantly larger models.
AB - We investigate the addition of constraints on the function image and its derivatives for the incorporation of prior knowledge in symbolic regression. The approach is called shape-constrained symbolic regression and allows us to enforce, for example, mono-tonicity of the function over selected inputs. The aim is to find models which conform to expected behavior and which have improved extrapolation capabilities. We demonstrate the feasibility of the idea and propose and compare two evolutionary algorithms for shape-constrained symbolic regression: (i) an extension of tree-based genetic programming which discards infeasible solutions in the selection step, and (ii) a two-population evolutionary algorithm that separates the feasible from the infeasible solutions. In both algorithms we use interval arithmetic to approximate bounds for models and their partial derivatives. The algorithms are tested on a set of 19 synthetic and four real-world regression problems. Both algorithms are able to identify models which conform to shape constraints which is not the case for the unmodified symbolic regression algorithms. However, the predictive accuracy of models with constraints is worse on the training set and the test set. Shape-constrained polynomial regression produces the best results for the test set but also significantly larger models.
KW - Symbolic regression
KW - Genetic programming
KW - Shape-constrained regression
KW - Biological Evolution
KW - Algorithms
UR - http://www.scopus.com/inward/record.url?scp=85125553656&partnerID=8YFLogxK
U2 - 10.1162/evco_a_00294
DO - 10.1162/evco_a_00294
M3 - Article
C2 - 34623432
SN - 1063-6560
VL - 30
SP - 75
EP - 98
JO - Evolutionary Computation
JF - Evolutionary Computation
IS - 1
ER -