nlstac: Non-Gradient Separable Nonlinear Least Squares Fitting

A new package for nonlinear least squares fitting is introduced in this paper. This package implements a recently developed algorithm that, for certain types of nonlinear curve fitting, reduces the number of nonlinear parameters to be fitted. One notable feature of this method is the absence of initialization which is typically necessary for nonlinear fitting gradient-based algorithms. Instead, just some bounds for the nonlinear parameters are required. Even though convergence for this method is guaranteed for exponential decay using the max-norm, the algorithm exhibits remarkable robustness, and its use has been extended to a wide range of functions using the Euclidean norm. Furthermore, this data-fitting package can also serve as a valuable resource for providing accurate initial parameters to other algorithms that rely on them.


Introduction
Experimental data often exhibits non-linear patterns.As such, researchers in applied science often have to try to fit these data with non-linear models which can be challenging to fit.In this paper, we introduce the nlstac package (Rodriguez-Arias et al., 2023), which implements the TAC algorithm for solving separable nonlinear regression problems, among others.Unlike other solvers, it does not require initialization values.Throughout the paper, we emphasize the potential synergistic usage of nlstac alongside other commonly used solvers, as it can provide reliable initialization values for them.The syntax of nlstac follows a similar structure to the solvers in the minpack.lmpackage (Elzhov et al., 2023), making it familiar to researchers experienced with those solvers.
The motivation behind developing the nlstac package stems from an approximation problem involving time series data.Specifically, we were working with data corresponding to measurements obtained from a thermometer reaching thermal equilibrium with the surrounding medium, particularly oceanic water.In accord with Newton's law of cooling, the temporal evolution of these data exhibits an exponential pattern described by the expression: where t represents the time variable.
Fitting data with the exponential pattern described by equation ( 1) is a nonlinear optimization problem.One challenge we encountered with widely used algorithms for such problems was the need to initialize the parameters in (1), as the solutions often strongly depended on the chosen initial values -a bad choice of initial values could lead to a sub-optimal local minimum or even make the algorithm not to converge at all.The TAC algorithm, around which the present package is built, is presented in Torvisco et al. (2018) and it overcomes this issue by eliminating the requirement for parameter initialization.It only needs to specify a broad interval in which to search for the nonlinear parameters.As we worked with the TAC algorithm, its robustness became increasingly evident-robustness in the sense of stability of the algorithm in relation with noisy data and the convergence for a great variety of problems.In our opinion, this advantage, along with the lack of initialization, outweighs the need to specify the exact pattern to be used.
While the convergence of TAC is proven using the max norm, as shown in Torvisco et al. (2018), we employ the Euclidean norm in the nlstac package and consider more general patterns beyond equation (1).This extension of TAC beyond its proven convergence conditions is supported by its reliable performance, as mentioned earlier.
We acknowledge the widespread use of other algorithms for nonlinear fitting, such as Gauss-Newton or Levenberg-Marquardt, with the former being the default choice for the nls function in the stats package (R Core Team, 2021).Hence, in the present paper, we aimed to showcase the similarities and differences between nlstac and the nls fit, not as a competition, but as a demonstration of how well these two algorithms can work in synergy.Researchers sometimes encounter difficulties in finding suitable initialization values for nls to achieve convergence.In this regard, since nlstac does not require users to specify good starting values to converge, the resulting estimates can be used to provide good starting values to nls or any other initialization-dependent algorithm.
The R Journal Vol.XX/YY, AAAA 20ZZ ISSN 2073-4859 arXiv:2402.04124v1[math.ST] 6 Feb 2024 Nonlinear regression, or more broadly, a problem of approximation Nonlinear regression or nonlinear fitting is a standard procedure commonly used in a wide range of scientific fields.Typically, we start with a dataset of q observations of n regressors (or predictor variables) and a response variable, namely {(x i , y i ), i = 1, . . ., q}, and a mathematical expression relating the regressors and the response variable.This mathematical expression may present a nonlinear dependency on several parameters.For instance, the aforementioned general expression is usually given by y = f (x; θ) + ε(x; θ), (2) being f : R n × R p → R a function, ε(x; •), independent and identically distributed random variables following a spherical normal distribution and θ = (θ 1 , . . ., θ p ) being the vector of parameters.Thus, the problem reduces to finding an estimate of the parameter vector θ * such that some cost function is minimized.A usual choice for the cost function is the well-known least-squares cost function so that we are solving an optimization problem.Namely, finding θ * ∈ R p such that Please observe it is assumed that only y i 's are observed with error, whereas x i 's are measured exactly.The possibility of generalizing this to measurement-error models is not implemented in nlstac.More information about Least Squares Problems can be found in, for example, Björck (1996) or Nocedal and Wright (2006).
The above problem can be described in Mathematical Analysis as an approximation problem.This kind of problem is determined by three elements: a set A, which is the object to be approximated; a family of functions F , whose elements are known as approximants; and finally an approximation criterion-a procedure to measure how close to A each element of F lies.In an approximation problem, a canonical question arises: does exist an element f ∈ F which is closest to A -attending to the approximation criterion-than every other element in F ?When the answer to this question happens to be affirmative, a method to locate one of such elements is needed and this is what nlstac is designed for.
Let us identify those elements in the problem described above.The element A to be approximated is the dataset of observations {(x i , y i ), i = 1, . . ., q}, which is a subset of R n × R. The family of approximants, F , is given by the following p-parametric family being f the mathematical expression relating the regressors and the response variable given in (2).Finally, the approximation criterion corresponds to the function g in (3).From now on, we will refer to one or the other definition depending on which one is more clear within the context.
Algorithms for finding the best set of parameters θ * of (3) can be divided into local solvers and global solvers, depending on whether they are designed to find a local or a global minimum of the cost function, respectively.In general, local solvers are, under certain conditions, fast and accurate.They are usually based on some sort of gradient descent algorithm and are iterative in nature.That is, they start at a given initial guess for θ and, at each iteration (hopefully) they find a better approximation of the minimum.Under some assumptions (e.g.convexity), the local and global minima may coincide.However, in many cases, we will find that there are many different local minima and, consequently, the solution given by those algorithms may depend on the initial guess.Therefore, a bad initial guess could land us in a sub-optimal local minimum and there are even cases for which that initial conditions may cause the algorithm to not converge.Some local algorithms are the steepest descent method, incremental steepest descent method, Newton's method, Quasi-Newton methods, Newton's methods with Hessian modification, BFGS algorithm, Gauss-Newton method, or Levenberg-Marquardt method.More information about this and other methods can be found in, for example, Nocedal andWright (2006), Arora (2015) or Rhinehart (2016).
On the other hand, global solvers do not depend that heavily on an initial condition, but they require an interval or area in which to start looking for the minimum.Some global algorithms, like grid-search, are known to converge in any case, but they scale very poorly, such that the computational time grows exponentially with the number of parameters to be found.There are also heuristic solvers, which are not guaranteed to converge to global minimum, but they give a reasonable approximation in cases where other algorithms either take too long or do not converge at all.Some examples of this last kind are Nelder-Mead method, genetic algorithms, particle swarm optimization, simulated annealing, or ant colony optimization, see Arora (2015).

Separable nonlinear regression problems
Some of the previous problems fall into the family of separable nonlinear regression problems.In this kind of problem, the nonlinear function f in (2) can be written as a linear combination of nonlinear functions.In particular, f takes the expression With this formulation, the original set of parameters θ has been split into two subsets: the linear parameters a = (a 1 , . . ., a r ) and the nonlinear parameters b = (b 1 , . . ., b s ).Obviously r + s = p and thus the number of nonlinear parameters to be determined is smaller than the total number of parameters.Therefore, the number of parameters to be found using nonlinear algorithms can be reduced.Separable nonlinear least squares methods are described, for example, in Golub and Pereyra (2003) and Golub and Pereyra (1973).
In this work we present the package nlstac, based on the TAC algorithm described in Torvisco et al. (2018) for solving the separable nonlinear regression problem given in (4).

Related packages
There are some widely used R packages that also deal with the problems nlstac is designed to solve.However, they rely on different algorithms which are often dependent on the choice of initial values.These packages are mainly nlsr (Nash and Murdoch, 2023) and minpack.lm,both solving the problems with variants of the Levenberg-Marquardt algorithm, lbfgs package (Coppola et al., 2022), which provides an interface to L-BFGS and OWL-QN algorithms, or minqa package (Bates et al., 2014), implementing derivative-free optimization algorithms.The algorithms used by this package fall into the aforementioned category of local algorithms.
There are also packages for fitting models in separable non-linear regression models, although these tend to be more specialized for specific problem domains.For example, the TIMP package (Mullen and van Stokkum, 2007) is used for physics and chemistry problems whereas spant package (Wilson, 2021) deals with magnetic resonance spectroscopy problems.For partially separable nonlinear fitting we find psqn package (Christoffersen, 2022).

The nlstac package
The nlstac package was developed with two objectives: first, to implement the algorithm described in Torvisco et al. (2018) in functions that could be used for estimating separable nonlinear regression models, and second, to implement these functions with standard syntax such that they would be convenient for users familiar with other curve-fitting functions such as lm, nls, or nlsLM from the minpack.lmpackage.
The package consists of three units: a formula decomposer, a linear least squares solver, and a grid search unit.
The workflow is depicted in Figure 1: The R Journal Vol.XX/YY, AAAA 20ZZ ISSN 2073-4859 1. From the dataset contained in a dataframe, an object of class formula, and a list of nonlinear parameters along with the initial ranges defined for each parameter, the formula decomposer, coded in internal function get_functions determines the nonlinear functions, ϕ i , defined in (4).
2. From the ranges (intervals) of each nonlinear parameter and the number of nodes, N, of each partition of such intervals, all possible combinations of nonlinear parameters are determined.
For each combination, a linear least square problem is solved, obtaining thus a set of plausible parameters.This is done by the get_best_parameters internal function.Such a set of 'plausible parameters' is obtained in the following way: Let b 1 , . . ., b s be the nonlinear parameters in (4).For each nonlinear parameter b i let [c i , d i ] denote the interval where to seek the estimation of b i and let denote the partition of such interval.Then, from these partitions, we construct a mesh in the rectangle [c 1 , Next, for each node of the grid, we obtain the optimal linear parameters by solving a linear least square problem.The nodes of the grid, along with the optimal linear parameters for each node, constitute the set of plausible parameters.

For
The new grid will be established by repeating steps 2 to 4 until one stopping criterion is met. 5.When stopping criteria are met, the result is returned as an object of class nlstac.
As indicated in step 5, the output given by the nls_tac function is an object of class nlstac.It is a list containing the following fields: • resid: The residuals.
• data: The original data.
• coefficients: A named vector containing the values of the parameters.
• stdError: A named vector with the standard error of the estimation of the coefficients.
• SSR: The sum of the squares of the residuals obtained by the fit.
• fitted: A vector containing the fitted values.
• dataset: A string with the name of the variable containing the data.
• formula: The formula used in the call of the function.
• df: The degrees of freedom • sigma: The standard deviation estimate.
• Rmat: R matrix in the QR decomposition of the gradient matrix used for the computation of the standard errors of the coefficients.
The class nlstac has also some extraction methods similar to lm, nls, glm.For instance, the methods summary.nlstac,predict.nlstac,and predict.summary.nlstacproduce identically formatted output as the summary functions for the lm and nls fits, as will be shown later.
The nlstac package (Rodriguez-Arias et al., 2023) is available in CRAN.The development version of the package can also be installed from the GitHub repository using the install_github function from the remotes package (Csárdi et al., 2021): remotes::install_github("rbensua/nlstac").

Arguments
As was mentioned above, the inputs for the nls_tac function are, at least, the fields data, formula, tol, N and nlparam.
The data field is a data frame containing the data to be fitted; tol is the tolerance measured as the relative difference between the values of the parameters in two consecutive iterations; its default value is 10 −4 ; N is the number of divisions we make in each nonlinear parameter interval in each iteration (defaults to 10); formula is either an object of formula class or a character string that can be coerced into a formula, and it must contain the pattern or formula which will be fitted, and nlparam is a list containing the nonlinear parameters as well as their initial ranges.

Summary of nlstac class
Information about the fit is stored in an nlstac class, and the function summary can display the most numerical relevant information about the fit.
Considering the example shown in Subsection 2.3.1 (Example 1. Exponential Decay), once the analysis is done, the summary function shows us the information of the analysis as follows: > summary(tacfit) Formula: temp ~a1 * exp(-k1 * time) + a2 Parameters: Estimate Std.Error t value Pr(>|t|) k1 1.399458e-02 8.107657e-05 172.6095 < 2 As can be seen, function summary gives us information about the formula used in the fitting, the estimated parameters along with some statistical information, the residual standard error, the number of iterations necessary to achieve convergence, and, finally, the tolerance value when convergence is achieved.

Examples
In this section, we present several examples to illustrate the use of the nlstac package in various scenarios.Each example highlights different aspects of nlstac's behavior compared to another commonly used R function for these types of problems: nls, which is included in the stats package and utilizes the Gauss-Newton algorithm by default.
For each example, we explore two different initializations for nls.First, we initialize nls with a reasonable set of initial values.Note that the actual values of the parameters are unknown and no a priori estimation of these values are available; therefore we denote as reasonable a set of values which are similar to the estimation obtained by TAC.This approach allows us to compare the fit achieved by nlstac with the widely used nls function.In the second initialization approach, we initialize nls with the estimation provided by nlstac for the same problem.This second approach serves as a starting point to ease the convergence of the algorithm used by nls and enables us to observe how effectively both algorithms can work in tandem.These examples shed light on the versatility and potential of the nlstac package in conjunction with the established nls function, providing valuable insights into their combined performance.
In Subsection 2.3.1 and 2.3.2 two examples with real data are presented.In both cases, the nlstac package obtains a solution, while the nls function does not converge with seemingly reasonable set of initial values.However, if the nls function is initialized with the output of nlstac, it successfully converges to a solution.In the first example, the fit remains the same, and in the second one, nls slightly improves the fit obtained by nlstac.These examples highlight the versatility of nlstac, which can be used either independently for fitting or to provide accurate initialization values for nls to converge effectively.
In Subsection 2.3.3 we present an example with simulated data.In this example, way beyond TAC proven convergence, we obtain a good fit to the model when running the nlstac package.However, if we initialize the nls function with seemingly reasonable set of initial values, it fails to converge.Nevertheless, by using the output of nlstac as initialization values for nls, we achieve an even better fit than what nlstac alone provides.This example also shows that nlstac can serve not only as a standalone fitting algorithm but also as a tool that enhances the performance of nls by providing reliable initialization values for improved fitting.
In Subsection 2.3.4,we present another example with simulated data where the nlstac package accurately fits the given pattern.However, in this case, the nls function converges using both initialization approaches.Interestingly, when a non-optimal initialization is used, the fit obtained by nls is poor.
The R Journal Vol.XX/YY, AAAA 20ZZ ISSN 2073-4859 In Subsection 2.3.5, we fit an exponential autoregressive model.This multi-variable example showcases the robustness of the TAC algorithm showing how the nlstac package can be utilized in wide range of scenarios.In this example, we generate three different datasets by making perturbations to a common underlying pattern.This example is quite interesting because for each dataset the behavior of nls differs.Furthermore, in Subsection 2.3.6,we utilize the nlstac package to fit real-world data to an autoregressive model.
In Subsection 2.3.7,we illustrate how the function parameter in the nls_tac function can be utilized to explicitly provide the functions within the family of approximants.This feature proves useful in cases where the algorithm does not accurately recognize the pattern.
Lastly, we want to mention that all figures presented in this section have been generated using the ggplot2 package (Wickham, 2016).

Example 1. Exponential Decay
Although we implement nlstac to fit data with virtually any nonlinear function, as mentioned in the introduction, the original purpose of the TAC algorithm was to fit exponential decays models such as (1).The convergence of TAC for exponential decays patterns is proved using the max-norm as the approximation criterion.

Patterns
a presented in (1) are widely used to fit data coming from measuring the temperature of a body during a time interval, and their use for this propose is based on Newton's law of cooling.Let us see an example of this use.
Five parameters: data, tol, N, formula and nlparam need to be passed, as indicated in Subsection 2.2.1.The first parameter, data, must be a 2-columns matrix containing data: instants and observations.We intend to fit pattern (1) to dataset Coolingwater from mosaicData package (Pruim et al., 2022).First, we define variable data.
data ] Once data is loaded, we specify the tolerance, tol, or stopping criterion, and the number of divisions to be made in each step, N.

tol <-1e-7 N <-10
We usually set the number of divisions to 10.However, if the search intervals for the nonlinear parameters are very wide or if we suspect that there may be many local minima, it might be advisable to increase the number of divisions to avoid converging to a sub-optimal local minimum.On the contrary, if we suspect that the computing time may be too high (for example, if the number of nonlinear parameters is large), it might be advisable to reduce the number of divisions.
Next, we specify the model to be used in the fitting, form, specifying the nonlinear parameters included in the model, nlparam, as well as the interval in which we assume they can be found.Please observe that the function does not require us to initialize the parameters whatsoever, we are just asked to provide a (wide) interval where to seek them.In this example, we have chosen the interval [10 −7 , 1] as the interval where k 1 must be sought.Note that the input formula is either an R formula object or an object coercible to it.For example, in this case, variable form is a string that can be coerced to a formula object.Also note that we only need to specify the names and the initial intervals for the nonlinear parameters in the nlparam input.Once the nonlinear parameters are given, the function nls_tac will call the formula decomposer that will try to determine the rest of the elements of the formula -i.e. the linear parameters and nonlinear functions described in equation ( 4).Finally, note that tacfit is an object of class nlstac containing the following fields: coefficients, stdError, convInfo, SSR, residuals, data and formula.
The R Journal Vol.XX/YY, AAAA 20ZZ ISSN 2073-4859 So far we have only used one algorithm for the fitting: the TAC algorithm.A rightful question to be asked would be how much this fit resembles the one provided by the widely used nonlinear Least Squares (NLS) algorithm.
For this purpose we will use the NLS algorithm, by means of nls function, to fit the same pattern to the same data.Since NLS requires initialization values, we initialize parameter k 1 as 0.1, and parameters a 1 as 50, and a 2 as 20.Then we run nls.nlsfit1 <-nls(formula = form, data = data, start = list(k1 = 0.1, a1 = 50, a2=20), control = nls.control(maxiter= 1000, tol = tol)) Although we have chosen reasonable initialization values, the algorithm did not converge.This shows a strength of TAC, which converges without the need for initialization values.
Besides the use of TAC and NLS as algorithms that can fit by themselves, they can also be used jointly.Since nlstac does not require initialization, just a set of intervals in which to seek the nonlinear parameters, nlstac can provide to nls good initialization values so that nls can successfully converge.The lack of dependence on initialization values of nlstac paired with the speed of nls and extended use among researchers, make them quite a good team.
When using nlstac and nls together, we use the coefficients obtained in the fit with nlstac to initialize and run nls for a second time.
nlsfit2 <-nls(formula = form, data = data, start = coef(tacfit), control = nls.control(maxiter= 1000, tol = tol)) In this case, the nls did indeed converge, but the fit coincides with nlstac's.This show that nls was not able to improve nlstac fit, even though it was initialized with its output.
We show the summary of nlstac and nls in Table 1.For both methods, the residual standard error is 0.1647017 on 180 degrees of freedom.For nlstac, the number of iterations to convergence is 13 and the achieved convergence tolerance is 2.87864×10 −8 .For nls when fitting with nlstac output as initialization, the number of iterations to convergence is 1 -meaning the initial values were close to the optimal values; furthermore, the algorithm was unable to improve those values-and the achieved convergence tolerance is 1.391929×10Note that all outputs coincide for both methods.
Figure 2 shows the data as gray dots.Both fits, the one provided by nlstac and the one provided by nls initialized using nlstac's best approximation, are shown in green.In some approximation problems it is necessary to fit the sum of two exponential decays, as published in Moreno-Flores et al. (2010).While nlstac completely solves this problem, nls, given a bad initial values, does not converge.However, if we initialize nls with nlstac output, nls fit the data in a very similar way that nlstac does.
In this example, we intend to fit a function such as Models of the form such as (5) were used in Moreno-Flores et al. (2010) in the fitting of data produced in indentation experiments carried out by scanning probe microscopes (e.g., Atomic Force Microscopes) in studies of viscoelastic mechanical properties of soft matter.
We intend to fit pattern (5) to Indometh data from the datasets package (R Core Team, 2021).We define parameter data and specify the tolerance, tol, and the number of divisions made in each step, N: We set the model to be used in the fitting, form, specifying the nonlinear parameters included in the model, nlparam, as well as the interval in which we assume they can be found.Finally, we apply the nls_tac function to get the fit.In a similar way as indicated in Example 2.3.1, we run nls initializing every parameter, that is, k 1 , k 2 , a 1 , a 2 and a 3 , as 1.Later we use the coefficients obtained with nlstac to initialize and run nls for a second time.
We show the summaries of nlstac and nls in Table 2 and Table 3 Figure 3 shows the data as gray dots; nlstac fit is shown in green and, in dashed red, nls fit is shown.
Although it is now hidden from the user, this implementation used to show some warnings related to a deficiency in a matrix rank.The explanation is that we are assuming both parameters k 1 and k 2 are contained within an interval [10 −7 , 10], so when we make two equal partitions of the same interval, one for each parameter, at some point both values will be the same: nodes of [10 −7 , 10] × [10 −7 , 10] where k 1 = k 2 and therefore we do not have a bi-exponential decay but only an exponential decay.Since these values are used in the resolution of a linear equation system that does not have a unique   solution, we obtain a warning because such a unique solution can not be found.The algorithm takes care of that problem removing these indeterminate parameter sets, as well as the warnings.
Also note that, for some choice of models, permutations of parameters may give the same results.For example, using the model above, for each pair of parameters (k 1 , k 2 ) there will be another pair of parameters (k 2 , k 1 ) which will offer the same fit.However, that is not a problem for us, other than for an increase in the time of execution of the code.Another similar scenario is when adjusting two sinusoidal waves and two exponential decays.If we wanted to avoid making the same calculations multiple times, we would have to change the code, forcing the user to specify which functions are non-identifiable for permutations of parameters, so we would get a more time-efficient code at the cost of simplicity.However we have chosen simplicity over time efficiency.
Another bi-exponential decay example with real data can be found in section 5 of Torvisco et al. (2018).

Example 3. Exponential decays with phase displacement
In this example, nlstac converges and nls, even when reasonable initialization values are given, does not.However, if we use nlstac output as an initialization for nls, nls not only converges but improves nlstac fitting.
Here we get to see two advantages of TAC algorithm.Firstly, nlstac converges.Secondly, when given its approximation for nls initialization, nls improves upon this fit.That shows us, again, the two ways of using nlstac: directly estimating models or providing initial values.
We intend to fit a function such as where As indicated in Subsection 2.2.1 we need to pass five parameters: data, tol, N, formula and nlparam.
We create data and determine the tolerance, tol, and the number of divisions we make in each step, N.
Later, we specify the model to be fitted, form, specifying the nonlinear parameters included in the model, nlparam, as well as the intervals in which we assume they can be found.
We show the results of both implementations in Table 4.We show the summaries of nlstac and nls in Table 5 and Table 6, respectively.

Method
Figure 4 shows the data as gray dots.In green, nlstac fit is shown.Dashed red line shows nls fit initialized with the parameters of nlstac's best approximation.This example is particularly significant since nlstac is outperformed by nls, both in time (nlstac's computing time is significantly higher) and precision (compare the residual standard error value for both methods in Tables 5 and 6), although nls function needs to be initialized with nlstac best approximation to be able to converge.

Example 4. Exponential decay mixed with a sinusoidal signal
In this example, where we mix an exponential decay with a sinusoidal signal, we obtain a good fit with nlstac and a similar fit with nls when we provide nlstac output as initialization values.However, if we initialize nls with a bad initialization values, we get a poor fit: nls' fit identifies the exponential decay quite properly but fails to identify the sinusoidal signal.
We intend to fit a function such as The  As indicated in 2.2.1 we need to pass five parameters: data, tol, N, formula and nlparam.
We create data and determine the tolerance, tol, or stopping criterion, and the number of divisions to be made in each step, N. set.seed(12345) x <-seq(from = 0, to = 10, length.out= 500) y <-3*exp(-0.85*x)+ 1.5*sin(2*x) + 1 + rnorm(length(x), mean = 0, sd = 0.3) data <-data.frame(x,y)tol <-1e-7 N <-10 Later we set the model to be used in the fitting, form, specifying the nonlinear parameters included in the model, nlparam, as well as the intervals in which we assume they can be found.As in previous examples, we compare the nlstac and nls output.For the first comparison we will run nls initializing every parameter, that is, k 1 , b 1 , a 1 , a 2 and a 3 , as 1, and for the second comparison, we will use nlstac output to initialize and run nls.We present the results obtained in Table 7.Without looking at any graphic, it is quite evident from Table 7 alone that the last fit is different from the two others.

Method
k  Summary of nlstac is shown in Table 8 and Table 9 shows the summary of nls initialized with the best approximation obtained with nlstac.Finally, summary of nls initialized with a vector of ones appears in Table 10 Figure 5 shows the data as gray dots.Green line represents the nlstac fit and dashed red line represents nls fit.Blue line represents nls fit initialized with a vector of ones.It is clear that the last fit is not accurate.It seems that the nonlinear least squares algorithm has managed to fit the exponential part of the pattern but it seems to have missed the sinusoidal part.
This example shows a situation where nlstac works perfectly, as well as nls if initialized correctly.However, if the user does not provide good initialization values, the nonlinear least squares algorithm might fail to obtain a good fit since it may get stuck in a local minimum.

Example 5. Exponential autoregressive model: a multi-variable approach (p-variable)
Nonlinear time models are used in a wide range of fields.In this example, we deal with an especially relevant nonlinear time series model: the exponential autoregressive model.Given a time series {x 1 , x 2 , x 3 , . ..}, the exponential autoregressive model is defined as where ε t are independent and identically distributed random variables and independent with x i , p denotes the system degree, t ∈ N, t > p, and the parameters to be estimated from observations are c, a i and b i (for i = 1, . . ., p).This model can be found in, for example, Xu et al. (2019) or Chen et al. (2018).
Some generalizations for the exponential autoregressive model have been made, and in this example, we will deal with a generalization of Teräsvirta's extended model that can be found in Chen et al. (2018, equation ( 10)) and we present here: where z i (for i = 1, . . ., p) are scalar parameters and d ∈ Z.
We would like to point out that the convergence of TAC algorithm has not been established for this type of problem.Further, this example is substantially different from the above examples since every observation depends on the previous ones.This model can not be described by a function of just one real variable.Instead, a vector of p real variables needs to be used.This approach can be developed considering a function from (R n−p ) p into R n−p .Therefore we transform a one-dimensional problem into a p-dimensional one.Let us explain this process.Let x = (x 1 , . . ., x n ) denote the observations and let us define p variables v1, . . ., vp ∈ R n−p , being vi = (x p−i+1 , . . ., x n−i ) for i = 1, . . ., p, which will allow us to redefine Equation (8) in terms of these new variables: where vd is fixed with d such that 1 ≤ d ≤ p and (vi) t denote the t-th component of variable vi.
For this example, first, we are going to fix one exponential autoregressive time series.random perturbation to the terms of the previously fixed time series.The aim of this example is to fit model ( 9) with p = 2, d = 2, and n = 100 for each dataset.We start defining a vector, seed, containing the three seeds.We define the tolerance level, tol, and the parameters for the time series as well as initialize the time series with the first two terms.We also define the pattern to be fitted.
We present the results for all three datasets in Table 11  Figure 6 shows the fitting for those three datasets.
Example 6. Exponential autoregressive model: a multi-variable approach (p-variable) with real data.
In this example, we intend to fit model ( 8) to returns on daily closing prices data from Financial Times Stock Exchange (FTSE) using nlstac package.More precisely, if vector x = (x 1 , . . ., x n ) denotes the daily closing prices data, we are going to fit the returns, that is, vector (  First, we read in the data and specify the tolerance level: x <-EuStockMarkets[,4] x <-diff(x)/x[-length(x)] tol <-1e-7 Then we transform the problem into a two-dimensional one: Finally, we make use of package doParallel to run nlstac in parallel:

Example 7. Explicitly providing the functions of the pattern
Package nlstac relies on the formula infrastructure of the R language to determine the number and expressions of the nonlinear functions ϕ i in (4).This is done internally by the function get_functions.
However, in some cases, the user may want to explicitly state which are the nonlinear functions that define the separable nonlinear problem (e.g. the formula decomposer fails to automatically identify those functions).In that case, there is an optional parameter in the nls_tac function, named functions, which is an array of character strings defining the nonlinear functions.In practical terms, we have not found an example in which we needed to manually specify the functions defining the model, but the optional parameter is available nonetheless.
One might think that parallelization always speeds up the algorithm, but in reality, initializing and stopping the cluster requires a certain amount of time.Therefore, in some cases it may be convenient to parallelize and in others it might not be worth it.
As was mentioned in the Introduction, the TAC algorithm, as all grid-search algorithms, scales poorly with the dimension of the problem (i.e. the number of nonlinear parameters).However, even The R Journal Vol.XX/YY, AAAA 20ZZ ISSN 2073-4859 for low-dimensional problems, the speed of the algorithm depends on the number of subdivisions of each parameter search interval (i.e. the width of the grid), which is defined by the parameter N in the nls_tac function.Previous affirmations rely on the fact that, for each iteration, the number of plausible nonlinear parameters happens to be N p , representing N, for each parameter, the number of nodes belonging to the partition of the interval where the parameter is assumed to be in and p the number of nonlinear parameters.Note that the number of plausible nonlinear parameters depends on N and exponentially increases with the number of nonlinear parameters p.
As an illustration of the convenience of using the parallel = TRUE option, Figure 8 depicts a comparison of non-parallel and parallel modes of the nls_tac function for two standard separable nonlinear least square problems.Namely, two exponential decays and three exponential decays.That is: As could be expected, we can see how as the number of nonlinear parameters increases, the computation time rises exponentially.Also, it shows that only for very small problems (e.g. two nonlinear parameters) the parallelization is not worth it in some cases (N up to 35).To run this simulation we had to make use of dplyr package (Wickham et al., 2022).

Conclusions
Many popular packages for nonlinear function estimation depend heavily on the choice of starting values.This package, however, implements an algorithm that needs no initialization and can handle a wide variety of approximation problems.
Our goal has been to create a package for nonlinear regression using the TAC algorithm and to show how this algorithm can work either by itself or when combined with other nonlinear estimation algorithms.
Processing times on problems with a large number of nonlinear parameters can be a problem.In those cases, it might be advisable to consider the use of a gradient-based algorithm.In future versions, the implemented grid search could be refined to reduce those processing times.Despite this possible drawback, we strongly believe that this package will be found useful by researchers in nonlinear regression problems.

Figure 1 :
Figure 1: Schematic workflow of the algorithm used in the nlstac package.

Figure 2 :
Figure 2: Example 1. Fitting an exponential decay for CoolingWater dataset.The figure shows the original data (grey points) and the nlstac fit along with the nls fit (green line).For more examples of using TAC algorithm on real-world data, see section 4 ofTorvisco et al.  (2018)  or subsection 5.1 of CabelloSánchez et al. (2021).

Figure 4 :
Figure 4: Example 3. The combined trend of three exponential decays with phase displacement.The figure shows the original data (grey points), the nlstac fit (green line) and the nls fit using best approximation (red line).

Figure 5 :
Figure 5: Example 4. Fit of an exponential decay mixed with a sinusoidal signal for dataset considered in example 4 with the model given in (7).The figure shows the original data (grey points), the nlstac fit (green line), the nls fit initialized with nlstac output (red line) and the nls fit initialized with a vector of ones (blue line).

Figure 6 :
Figure 6: Example 5. Fits for three exponential autorregresive model datasets with model given in (8) as determined at the beginning of Example 5.The figure shows the original data (grey points), the nlstac fit (green line), the nls fit initialized with nlstac output (red line) and the nls fit initialized with nlstac output (blue line).Note that, in Dataset 1, blue line overlaps the green one.

Figure 8 :
Figure 8: Comparison between the parallel and the non-parallel implementations of the nls_tac function for the fitting of two (left) and three (right) exponential decays, for different values of the number of subdivisions, N. Note how the time of processing is increased around a hundred times when changing from two nonlinear parameters to three nonlinear parameters.
Bibliography R. K. Arora.Optimization.Algorithms and applications.Taylor & Francis Group, LLC, 2015.[p2] D. Bates, K. M. Mullen, J. C. Nash, and R. Varadhan.minqa: Derivative-free optimization algorithms The R Journal Vol.XX/YY, AAAA 20ZZ ISSN 2073-4859 each set of plausible parameters, the loss function (namely the sum of the squares of the residuals) is computed, and a grid search is performed to obtain the minimum value of the loss function.Let (b m 1 1 , . . ., b m s s ) be the node of the grid in which the loss function minimizes.4. Stopping criteria are met when either the maximum number of iterations is reached or when the size of the partition of every interval [c i , d i ] is lower than the tolerance level.If stopping criteria are not met, the grid is refined: for each parameter, b i , a new subinterval where to seek the estimation of such parameter is considered,

Table 2 :
, respectively.Example 2. Summary of nlstac for data of subject 3 in Indometh dataset with model given in

Table 3 :
Example 2. Summary of nls for data of subject 3 in Indometh dataset with model given in

Table 4 :
Example 3. Parameter estimates corresponding to nlstac and nls fit for dataset considered in example 3 with the model given in (6).Values have been rounded to the eighth decimal place.Please recall that nls initialized without the estimate from nlstac does not converge.
R Journal Vol.XX/YY, AAAA

Table 6 :
Example 3. Summary of nls for dataset considered in example 3 with the model given in (6).Residual standard error: 0.1307 on 55 d.o.f.Number of iterations to convergence: 16.Achieved convergence tolerance: 2.8×10 −6 .

Table 7 :
Example 4. Parameters corresponding to nlstac and nls fits for dataset considered in example 4 with the model given in (7).Values have been rounded off to the eighth decimal place.

Table 9 :
Example 4. Summary of nls initialized with nlstac's best approximation for dataset considered in example 4 with the model given in (7).Residual standard error: 0.2974 on 495 d.o.f.Number of iterations to convergence: 4. Achieved convergence tolerance: 5.597×10 −9 .

Table 10 :
Example 4. Summary of nls initialized with a vector of ones for dataset considered in example 4 with the model given in (7).Residual standard error: 1.056 on 495 d.o.f.Number of iterations to convergence: 23.Achieved convergence tolerance: 7.587×10 −8 . .

Table 11 :
Example 5. Summary of all three methods (nlstac, nls without nlstac initialization, nls with nlstac initialization) for all three datasets considered in example 5 with the model given in (8).Missing methods for Dataset 2 and Dataset 3 are the result of the non-convergence of such methods.
This data was obtained by EuStockMarkets dataset which is accessible from datasets package.As in the previous example, we are going to consider p = 2 and d = 2.
Table12and a plot with both the data and the fit obtained by nls_tac function is depicted in Figure6

Table 12 :
Sánchez et al. (2021) nlstac for returns from FTSE in EuStockMarkets dataset with the model given in (8).Residual standard error: 0.00791658 on 1849 d.o.f.Number of iterations to convergence: 11.Achieved convergence tolerance: 1.94289×10 −16 .Another example of an exponential autoregressive model with real data can be consulted in subsection 6.3 of CabelloSánchez et al. (2021).Example 6. Fit of an exponential autorregresive model for returns from FTSE in EuStockMarkets dataset with model given in (8).The figure shows the original data (black lines) and the nlstac fit (green line).