TY - JOUR
T1 - Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming
AU - Kronberger, Gabriel
AU - Olivetti de França, Fabrício
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2025/7/1
Y1 - 2025/7/1
N2 - Gradient-based local optimization has been shown to improve results of genetic programming (GP) for symbolic regression (SR) – a machine learning method for symbolic equation learning. Correspondingly, several state-of-the-art GP implementations use iterative nonlinear least squares (NLS) algorithms for local optimization of parameters. An issue that has however mostly been ignored in SR and GP literature is overparameterization of SR expressions, and as a consequence, bad conditioning of NLS optimization problem. The aim of this study is to analyze the effects of overparameterization on the NLS results and convergence speed, whereby we use Operon as an example GP/SR implementation. First, we demonstrate that numeric rank approximation can be used to detect overparameterization using a set of six selected benchmark problems. In the second part, we analyze whether the NLS results or convergence speed can be improved by simplifying expressions to remove redundant parameters with equality saturation. This analysis is done with the much larger Feynman symbolic regression benchmark set after collecting all expressions visited by GP, as the simplification procedure is not fast enough to use it within GP fitness evaluation. We observe that Operon frequently visits overparameterized solutions but the number of redundant parameters is small on average. We analyzed the Pareto-optimal expressions of the first and last generation of GP, and found that for 70% to 80% of the simplified expressions, the success rate of reaching the optimum was better or equal than for the overparameterized form. The effect was smaller for the number of NLS iterations until convergence, where we found fewer or equal iterations for 51% to 63% of the expressions after simplification.
AB - Gradient-based local optimization has been shown to improve results of genetic programming (GP) for symbolic regression (SR) – a machine learning method for symbolic equation learning. Correspondingly, several state-of-the-art GP implementations use iterative nonlinear least squares (NLS) algorithms for local optimization of parameters. An issue that has however mostly been ignored in SR and GP literature is overparameterization of SR expressions, and as a consequence, bad conditioning of NLS optimization problem. The aim of this study is to analyze the effects of overparameterization on the NLS results and convergence speed, whereby we use Operon as an example GP/SR implementation. First, we demonstrate that numeric rank approximation can be used to detect overparameterization using a set of six selected benchmark problems. In the second part, we analyze whether the NLS results or convergence speed can be improved by simplifying expressions to remove redundant parameters with equality saturation. This analysis is done with the much larger Feynman symbolic regression benchmark set after collecting all expressions visited by GP, as the simplification procedure is not fast enough to use it within GP fitness evaluation. We observe that Operon frequently visits overparameterized solutions but the number of redundant parameters is small on average. We analyzed the Pareto-optimal expressions of the first and last generation of GP, and found that for 70% to 80% of the simplified expressions, the success rate of reaching the optimum was better or equal than for the overparameterized form. The effect was smaller for the number of NLS iterations until convergence, where we found fewer or equal iterations for 51% to 63% of the expressions after simplification.
KW - Equality saturation
KW - Expression rewriting
KW - Machine learning
KW - Nonlinear least squares
KW - Symbolic regression
UR - http://www.scopus.com/inward/record.url?scp=85210716019&partnerID=8YFLogxK
U2 - 10.1016/j.jsc.2024.102413
DO - 10.1016/j.jsc.2024.102413
M3 - Article
AN - SCOPUS:85210716019
SN - 0747-7171
VL - 129
JO - Journal of Symbolic Computation
JF - Journal of Symbolic Computation
M1 - 102413
ER -