Fast and comprehensive fitting of complex mathematical models to massive amounts of empirical data
Journal : Chemometrics and Intelligent Laboratory Systems , vol. 117 , p. 13–21 , 2012
Publisher : Elsevier
International Standard Numbers
Printed : 0169-7439
Electronic : 1873-3239
Publication type : Academic article
DOI : doi.org/10.1016/j.chemolab.201...
If you have questions about the publication, you may contact Nofima’s Chief Librarian.
The new method for parameterising a high number of observed curves in terms of nonlinear functions, presented by Isaeva et al. is here applied to noisy data and tested with respect to computational speed, ease of use and estimation precision. The method employs conventional least squares minimisation of the lack-of-fit residuals. But algorithmically it replaces traditional, time-consuming iterative hill-climbing (e.g., simplex optimisation) by a fast, non-iterative linear projection. Each nonlinear function is emulated by its multivariate metamodel (a low-dimensional bi-linear principal component analysis model of its behaviour), and yields parameter estimates by a simple projection plus a data base look-up. For setting up a generic, fast modelling system for line curvature, a set of 38 widely different mathematical functions - most of them nonlinear - were selected for their ability to give sigmoid curves. For each model, its behavioural repertoire was established by designed computer simulation, and its multivariate metamodel was estimated. Then the new curve fitting approach was compared to conventional simplex optimisation, by fitting artificial, but noisy curves to the 38 curve-functions, in order to identify the correct function type and parameter values. Finally, the new method was adapted to heteroscedastic noise and employed for parameterisation of >170,000 sigmoid curves from time lapse monitoring of proteomic 2D Gel Electrophoresis (2DGE) image development. The new method gave at least as precise parameter estimates as the simplex optimisation and worked well both for homoscedastic and heteroscedastic noise. It speeded up the parameter estimation in the nonlinear models by a factor of about 24 compared to the simplex optimisation. Moreover, per definition it avoids the problems of having to select starting values and ending up in locally optimal solutions. And it reduced the problem of subjective, possibly erroneous choice of nonlinear model specification.