Repository logo

A new formulation for symbolic regression to identify physico-chemical laws from experimental data

Accepted version



Change log


Neumann, P 
Cao, L 
Vassiliadis, VS 
Lapkin, AA 


A modification to the mixed-integer nonlinear programming (MINLP) formulation for symbolic regression was proposed with the aim of identification of physical models from noisy experimental data. In the proposed formulation, a binary tree in which equations are represented as directed, acyclic graphs, is fully constructed for a pre-defined number of layers. The introduced modification results in the reduction in the number of required binary variables and removal of redundancy due to possible symmetry of the tree formulation. The formulation was tested using numerical models and was found to be more efficient than the previous literature example with respect to the numbers of predictor variables and training data points. The globally optimal search was extended to identify physical models and to cope with noise in the experimental data predictor variable. The methodology was proven to be successful in identifying the correct physical models describing the relationship between shear stress and shear rate for both Newtonian and non-Newtonian fluids, and simple kinetic laws of chemical reactions. Future work will focus on addressing the limitations of the present formulation and solver to enable extension of target problems to larger, more complex physical models.



Model identification, Chemical process development, Symbolic regression, Automated model construction, Mixed-integer nonlinear programming (MINLP), Global optimization

Journal Title

Chemical Engineering Journal

Conference Name

Journal ISSN


Volume Title



Elsevier BV
Engineering and Physical Sciences Research Council (EP/R009902/1)
National Research Foundation Singapore (via Cambridge Centre for Advanced Research and Education in Singapore (CARES)) (unknown)
EPSRC EP/R009902/1