dr4pl: A Stable Convergence Algorithm for the 4 Parameter Logistic Model

The 4 Parameter Logistic (4PL) model has been recognized as a major tool to analyze the relationship between doses and responses in pharmacological experiments. A main strength of this model is that each parameter contributes an intuitive meaning enhancing interpretability of a fitted model. However, implementing the 4PL model using conventional statistical software often encounters numerical errors. This paper highlights the issue of convergence failure and presents several causes with solutions. These causes include outliers and a non-logistic data shape, so useful remedies such as robust estimation, outlier diagnostics and constrained optimization are proposed. These features are implemented in a new R package dr4pl (Dose-Response analysis using the 4 Parameter Logistic model) whose code examples are presented as a separate section. Our R package dr4pl is shown to work well for data sets where the traditional dose-response modelling packages drc and nplr fail.


Introduction
Dose-response models are a primary tool of pharmacological and toxicological studies. After fitting a suitably chosen model to experimental data, the fitted parameters are used to evaluate the potency of a drug or compare the potencies of different drugs. Based on outcomes of the analysis, drug candidates may be advanced to further development. One of the most popular models among dose-response models is the 4 Parameter Logistic (4PL) model whose detailed explanation is given in Subsection 4 Parameter Logistic model. In this paper, we focus on some important challenges in the 4PL model and present methods to handle those challenges.
Fitting dose-response models to data is usually a nonlinear regression problem. One of the problems with nonlinear regression is that numerical errors often occur during a model fitting optimization process. The 4PL model is not an exception. Examples of problem data sets and diagnostics of their problems are presented in Subsection Motivation: convergence failure.
In this paper, we analyze the convergence failure problem within the context of the 4PL model. Descriptions of the 4PL model and existing methods are presented in Section Existing methods. Problems are clearly illustrated using the contour of the likelihood function, shown in Section Diagnosis of convergence failure. This suggests a useful set of constraints to put on the parameter space, and their effectiveness is analyzed in that section as well. Non-convergence is indicated by our R package dr4pl (Landis et al., 2018) when crossing one of these boundaries, in which case robust parameter estimates are returned as described in Section Remedy for outliers. Section Remedy for the support problem presents a new method of selecting initial parameter estimates which is not currently implemented in other dose-response modelling R packages. This method will be shown to be effective in handling a non-logistic data shape in that section. Walk-through of R functions in dr4pl with real-world data sets will be presented in Section Walk-through in R.

Existing methods
This section describes the 4PL model and existing methods that have been implemented for the 4PL model. Example data sets that motivate our research are presented as well.

Parameter Logistic model
The response in a pharmacological process usually slowly trends downward at low dose levels, then rapidly drops at a certain dose level and finally stabilizes. Tracking this sort of change in response results in a reverse S shaped curve motivating sigmoidal curves as primary models of interest. Robertson (1908) suggested that the biochemical nature of such processes is well modelled by the differential equation dg dz = θ 3 θ 1 g(z) (θ 1 − g(z)) , where z is a dose level, g(z) is the response at the level z and θ 1 , θ 3 ∈ R are parameters. Solving this differential equation yields g(z|θ 1 , θ 2 , θ 3 ) = θ 1 − θ 1 1 + (z/θ 2 ) θ 3 , where θ 2 > 0 is a parameter coming from a boundary condition given to Equation (1). Since this model allows the response value g(z) to range only between 0 and θ 1 , a common extension uses a shift parameter θ 4 ∈ R yielding the well-known 4 parameter logistic (4PL) model or Hill model given in Hill (1910) as f (x|θ) = θ 1 + θ 4 − θ 1 1 + 10 θ 3 (x−log 10 θ 2 ) where x ∈ R is a log 10 dose, z(x) = 10 x > 0 is the corresponding dose and the parameter vector θ = (θ 1 , θ 2 , θ 3 , θ 4 ) determines the relationship between the log 10 dose x and the response f (x).
Working on this log 10 scale has been shown to accurately model many physical reactions.
A major strength of the 4PL model is that each of the four parameters has an intuitive meaning. The parameters θ 1 and θ 4 are called the upper asymptote and lower asymptote, respectively, since we have from Equation (2)  In this paper, we assume that θ 1 > θ 4 holds. This implies that θ 3 > 0 corresponds to an increasing curve while θ 3 < 0 corresponds to a decreasing curve. A response usually decreases as a dose increases in a pharmacological experiment, so we focus on decreasing curves throughout this paper. However, all the functionalities such as model fitting and confidence interval calculation are implemented in the R package dr4pl for both decreasing and increasing curves. The parameter θ 2 is called the half maximum inhibitory concentration (IC50) or effective concentration (EC50), depending on whether the curve is decreasing or increasing, which represents the dose level whose response lies halfway between the upper and lower asymptotes. That is, we have f (θ 2 ) = (θ 1 + θ 4 ) /2. The parameter θ 3 is related to the slope of the function f at θ 2 as given in Appendix A of Giraldo et al. (2002) by where ln indicates the natural logarithm.
The effects of the IC50 parameter θ 2 and the slope parameter θ 3 on the 4PL model is illustrated in Figure 1. For each plot in the figure, the x-axis represents dose levels on a log 10 scale and the y-axis represents response values. For both the left and right plots, the upper asymptote θ 1 is fixed at 100 and the lower asymptote θ 4 , which is represented by a horizontal gray dashed line, is fixed at 0. The left plot shows the change in the dose response curve as the IC50 parameter θ 2 changes. Note that the blue dotted curve with the largest θ 2 value of 10 is located on the right while the red solid curve with the smallest θ 2 value of 0.1 is located on the left. However, the rate of decrease θ 3 of each curve is the same. The right plot shows that the blue dotted curve with the most negative θ 3 value of −2 most rapidly decreases around the IC50 value while the red solid curve with the least negative θ 3 value of −0.5 most slowly decreases. Note that all three curves in the right plot achieve the half response θ 2 at the same dose level, i.e. they have the same IC50 values.

Motivation: convergence failure
As mentioned in Section Introduction, dose-response models are prone to numerical errors in their optimization processes like other nonlinear regression models; see Bates and Watts (1988) and Seber and Wild (1989) for theoretical and practical issues related to nonlinear regression models. The specific type of errors in dose-response models depends on the data and statistical software at hand, and here we focus on R packages developed for bioassay data. The R package drc (Ritz et al., 2005) provided various non-linear regression models for bioassay data analysis. The R package nplr (Commo and Bot, 2016) focused on the family of n-parameter logistic regression models with an option to choose an optimal value n based on a goodness-of-fit measure. Unfortunately, these two R packages have only scant solutions to the convergence failure problem. For example, drc provides a function argument errorm to control whether an error or a warning message is drawn when convergence failure happens, and nplr provides a weighting scheme that reduces the effect of outliers. Both methods did not suffice to prevent convergence failure as shown in the following. Figure 2 shows four real-world data sets. The R package drc fails to converge on all these four data sets and the R package nplr fails on the last data set. In each plot of the figure, the x-axis represents doses on a log 10 scale and the y-axis represents responses. Data points flagged as problematic by our Figure 1: 4PL models with different values of the IC50 parameter θ 2 (left plot) and the slope parameter θ 3 (right plot). The other parameters are fixed at θ 1 = 100, θ 4 = 0 with θ 3 = −1 in the left plot and θ 2 = 1 in the right plot. The left plot implies that θ 2 controls horizontal shifts while the right plot implies that θ 3 drives rates of decrease.
R package are shown as red triangles. These include strong outliers in every case, and are found as described in Subsection Robust regression and outlier detection. Error Case 1 in the upper left plot has an outlier at the dose level 10 −5 which seems to result from an error in measurement. Error Case 2 in the upper right plot has more complicated problems. There exists a strong outlier at the dose level 1.35, and the two responses measured at the dose level 0.135 seem to be measured at a wrong scale. Both of them have abnormally large values and their values are far more different from each other than the responses measured at other dose levels. Even though these two responses were not detected as outliers by our method, this plot indicates that the data set of Error Case 2 requires users' attention.
The data in the lower two plots have a different problem in addition to outliers. They seem to have insufficient data points on the right side of their IC50 parameters. This situation is well known in dose-response experiments in cases where solubility issues limit the maximal drug concentration that can be used in dose titration. Error Case 3 in the lower left plot shows two strong outliers with abnormally small response values at the dose level 10. In addition, the response starts decreasing at the dose levels 10 and 100 but no more responses are measured at higher dose levels. This is because the drug fell out of solution at higher concentrations in this experiment. We call this type of problem a support problem because of a lack of data points on the right side, and possibly also on the left side, of the IC50 parameter. This lack in Error Case 3 can result in instability of an optimization process due to difficulty in estimation of the IC50 parameter θ 2 and the slope parameter θ 3 . Indeed, the package drc draws an error message for that data. Error Case 4 in the lower right plot has a similar support problem also resulting in a drc error.
Convergence failure of numerical optimization methods can be overcome in various ways. Experimenters often try other starting parameter values or adopt a simpler mathematical model. However, both solutions have a drawback in dose-response modelling since they require arbitrary decisions by experimenters. This limits their utility for automated, high-throughput screening and may be cost-prohibitive depending on the expense per data point. It can be foolhardy to decide which starting values should be tried next unless problems with data are diagnosed. Performing a grid search on the parameter space can be an alternative as claimed in Wang et al. (2010), but even this does not fundamentally solve the problems with the data sets in Figure 2. Shifting to a simpler model can result in loss of interpretability of the 4PL model. As pointed out in Di Veroli Giovanni et al. (2015) and Lim (2015), the 4PL model conveys useful intuitive content through their parameters.

Diagnosis of convergence failure
The 4PL model given in Equation (2) is fitted to a data set {(x i , y i ) |i = 1, 2, · · · , n} where x i is the log 10 dose level of the i-th observation and y i is the corresponding response value. The most commonly Figure 2: Data that generate numerical errors in the R packages drc and nplr. The y-axis represents responses and the x-axis represents doses in a log 10 scale. All data sets have at least one strong outlier which are denoted by red triangles. Data in the lower row also have insufficient numbers of data points on the right sides of the dose-response curves. chosen loss function is the sum-of-squares errors essentially given as where the function f was given in Equation (2) as Since this function is nonlinear in the parameters θ, minimization of L is a typical nonlinear regression problem. Gadagkar and Call (2015) mentioned the optimization problem is indeed a non-convex optimization, so different initial parameter estimates can result in different fitted models or even convergence failure. Example data sets with such a convergence failure problem were given in Figure  2. In this section, we present one possible solution to the convergence problem based on statistical theory, which can help users diagnose problems with their data.

Shape of a loss function
To find a reason behind the convergence failure, we draw contour plots of the sum-of-squares loss function given in Equation (3). To visualize the loss function L (θ|x), we first fix the values of θ 1 and θ 4 at y max and y min defined as respectively. The reason for choosing the above estimates is that the R package drc uses them as a default. Then we draw contour plots of the loss functions in terms of θ 2 and θ 3 yielding Figure 3 in which the contour plots of the data of Figure 2 are shown. In each plot, the x-axis represents the IC50 parameter θ 2 in a log 10 scale and the y-axis represents the slope parameter θ 3 . The parameter values corresponding to the same loss function value are connected as a solid line and the error value is given as text inserted into that line. It can be seen from the upper two plots that the loss function monotonically decreases as the IC50 parameter θ 2 decreases towards zero. This results in numerical errors in computing the estimated mean response in Equation (4), since as θ 2 → 0, the log 10 (θ 2 ) → −∞.
The lower two plots also indicate that the loss function tends to decrease as the values of the IC50 parameter θ 2 increase towards infinity. Similarly to the upper two plots, this causes numerical errors in finding the minimizer of the loss function.

Hill bounds: safeguards on the parameter space
The primary resolution of the convergence failure problem comes from imposing appropriate boundaries on the parameter space to prevent parameter estimates from diverging to infinity or converging to 0. These bounds should be broad enough to contain the global minimum of the loss function when it has any local minimum. To obtain reasonable sizes of bounds on the parameters, we focus on the Hill plot. Recall from Equation (2) that the mean response is modelled as where x is a log 10 dose. Note that the nonlinear nature of the 4PL model comes from the fact that the linear function of x, θ 3 (x − log 10 θ 2 ), appears as an exponential in the denominator of the expression. This motivates us to rearrange the terms to yield the Hill equation modeled as The left hand side of this equation is the logit-transformed mean response and the right hand side is the linear function of log 10 dose x. Since both the asymptotes θ 1 and θ 4 are unknown, we use y max and y min defined in Equation (5) as their estimates in real data approximation as See Chapter 2 of Pratt and Taylor (1990) for detailed discussion of the Hill plot and Hill equation.  Figure 2. The x-axis represents the IC50 parameter θ 2 and the y-axis represents the slope parameter θ 3 . The loss functions in the upper two plots decrease as the IC50 θ 2 decreases, and the loss functions in the lower two plots decrease as θ 2 increases; a minimum is not achieved in the plots.
The R Journal Vol. 11/2, December 2019 ISSN 2073-4859 To obtain reasonable bounds on the parameters θ 2 and θ 3 , we first view Equation (7) as a linear regression problem. If we re-parameterize Equation (7) as where β 2 = −θ 3 log 10 (θ 2 ), β 3 = θ 3 and i ∼ N 0, σ 2 , we can obtain interval estimators of β 2 and β 3 based on (1 − α) confidence intervals based on the normality assumption. That is, we compute β 2 − z 1−α/2 s.e. β 2 ,β 2 + z 1−α/2 s.e. β 2 , whereβ 2 andβ 3 are the least squares solutions of Equation (8), z p is the p-th quantile of the standard normal distribution and s.e. is the abbreviation for the standard errors of the regression coefficients. If the linear model of Equation (7) well approximates the original 4PL model of Equation (2), then parameter estimates will rarely be out of the bounds in Equation (9) during the optimization process.
However, using these bounds in optimization have limitations in practice in that they may miss important nonlinear structure inherent in the 4PL model. In real data analysis, such confidence intervals imposed too narrow bounds on the parameters and hindered optimization processes from convergence even when data have good logistic shapes. Section 5.1 of Seber and Wild (1989) present approximate confidence intervals of the true parameters of nonlinear regression models based on the derivatives of the mean response function f . When data have sufficiently many data points, the least squares estimatorsθ have asymptotically normal distributionŝ where θ * is the true parameter vector and C = J T J with J being the Jacobian matrix whose (i, j)th element is given as ∂ f (x i |θ) /∂θ j . In actual computation of confidence intervals, the parameter estimatorsθ are used to approximate J, i.e. we substituteĴ whose Based on this asymptotic theory, the approximate confidence intervals for the true parameters are given in Section 5.1 of Seber and Wild (1989) as for 1 ≤ j ≤ p where p is the number of parameters, s 2 is the residual sum of squares, t α n is the α-th quantile of the t-distribution with n degrees of freedom andĉ j,j is the j-th diagonal element of the matrixĈ −1 = Ĵ TĴ −1 . In the context of the 4PL model, we evaluate the interval estimators based on parameter estimatorsθ 1 ,θ 2 ,θ 3 ,θ 4 obtained from the Hill equation in Equation (7) and standard errors obtained from Equation (11). That is, we have for 1 ≤ j ≤ 4 wherec jj is the j-th diagonal element of the variance-covariance matrixC −1 evaluated at the initial parameter estimatorsθ obtained from the Hill equation. Since the bounds in Equation (12) originate from the Hill model, we call these the Hill bounds. Subsection 5.2.2 of Seber and Wild (1989) suggests usingĤ/2 instead ofĴ TĴ when computing confidence intervals wherê is an estimated Hessian matrix of the loss function L. We implement both versions of the Hill bounds based onH/2 andJ TJ in our R package with the default being the latter, since the latter tends to yield broader Hill bounds. Users can try using both options in real data analysis. The confidence level α is set at 0.0001 as a default and can be given by users as a tuning parameter.
One can wonder if bounds on the upper asymptote θ 1 and lower asymptote θ 4 are needed as well.
The following theorem shows that such bounds are unnecessary since diverging values of θ 1 and θ 4 always increase loss function values.  Table 2 and an example data set generated by Model 7. The left plot shows the dose response curves of the models in Table 2. The right plot shows data generated by Model 7 of Table 2.
Proof. See Appendix.

Analysis of the Hill bounds
In this subsection, we evaluate whether the Hill bounds are appropriate as a criterion for convergence failure. We simulate data following the 4PL model and check whether the Hill bounds prevent optimization processes from finding the global minimum of the loss function L. To evaluate this, we generate data and count the numbers of times the Hill bounds miss the true parameters. We also count the numbers of times that optimization procedures encounter the Hill bounds at the end of optimization. Lower counts in both cases indicate that the Hill bounds do not hinder optimization processes from converging to desired parameter estimates. Figure 4 shows parameter settings of our simulation. The left plot shows nine dose-response curves that generate data whose values of parameters are shown in Table 1. The levels of slope parameter θ 3 are denoted by line types while the levels of IC50 parameter are denoted by colors. The right plot shows the dose-response curve of Model 7 with data generated by the model. This model is chosen since it exhibits the most severe support problem among all the nine models. Even though data in the right plot are generated by a normal distribution, it seems that any optimization procedure can have trouble fitting the 4PL model to that data. We simulate 1,000 data sets with five observations at each dose level resulting in 25 observations in total. The standard deviation of the underlying normal distribution of a 4PL model is fixed at 5 throughout the simulations.
Parameters θ 3 = −0.5 θ 3 = −1 θ 3 = −1.5 θ 2 = 0.01 0.054 0.014 0 θ 2 = 0.1 0 0 0.002 θ 2 = 1 0.181 0 0 Parameters θ 3 = −0.5 θ 3 = −1 θ 3 = −1.5 θ 2 = 0.01 0.038 0.009 0.1 θ 2 = 0.1 0.12 0.002 0.195 θ 2 = 1 0.398 0.258 0.079 Table 3: Proportions of times out of 1,000 simulations that the Hill bounds are attained by optimization procedures. The Hill bounds do not prevent optimization processes from achieving local minima of L for most of cases, except when data have support problems as shown in the bottom row. Table 2 shows the proportions of times that the Hill bounds miss true parameters of the nine models given in Figure 4. For each simulated data set, we counted the number of times that the Hill bounds miss any of the IC50 and slope parameters. Note from the table that the Hill bounds contain the true parameters for most of the simulated data. The only exception is when data have a severe support problem (θ 2 = 1) and the slope of the dose-response curve is very gradual (θ 3 = −0.5).
Since we do not have well-shaped data for this case, we take the fact that the Hill bounds miss the true parameters for these data as an advantage. When the Hill bounds are attained at the end of the optimization, it is recommended to look at the scatter plot, to investigate potential support problems. As illustrated in the top two rows of the table, in cases where the range of the data is enough for estimation on both sides of the IC50, the Hill bounds successfully contain the true parameters. Table 3 presents the proportion of times the Hill bounds are hit at the end of optimization processes out of 1,000 simulations. The final parameter estimates are inside the Hill bounds for most cases. This indicates that the Hill bounds are not too restrictive for optimization procedures to find a local minimum of the loss function L. However, when there are severe support problems as shown in the bottom left and bottom middle cells, the Hill bounds restrict optimization processes to be inside the bounds. As implied in the previous paragraph, this can help users realize that their data suffer from support problems and actions should be taken.

Remedy for outliers
As seen in the previous section, it seems that the convergence failure results from either outliers or the support problem. We tackle each of the causes with a different approach. Outlier problems are often dealt with by robust estimation methods. The paper Riazoshams et al. (2009) reviewed various methods of outlier detection in the context of nonlinear regression, especially when the form of the mean response function f is known. On the other hand, Dixon's Q test in Dean and Dixon (1951) can be used to detect an outlier without any assumptions on the form of f , but this method is most effective when a data size is less than or equal to 10. In this paper, we use a novel method presented in Motulsky and Brown (2006) that combines robust estimation and outlier detection. This method is currently adopted in GraphPad Prism and called the ROUT method therein as a default outlier detection method. The indices of outliers in data and a scatter plot with a robust fit are provided to users to enhance their understanding of problems.

Robust regression and outlier detection
As noted in Subsection Motivation: convergence failure, one of the main reasons for convergence failure is strong outliers. In this case, the contour of the loss function L(θ|x, y) can exhibit monotonicity as a function of the IC50 parameter θ 2 as shown in Figure 2. This implies that the Hill bounds of Subsection Hill bounds: safeguards on the parameter space can be attained at the end of an optimization process. The R package dr4pl reports in this case that there seems to exist a problem in data and it requires a user's attention. Meanwhile, the R package fits a robust regression model to data and reports indices of outliers in data with a plot highlighting the outliers in red triangles, which will be shown in Figure  6. By studying the robust fit and the indicated outliers, the user can get some insight into the data and decide whether the observations denoted as outliers should be removed or not.
To determine which data observations are outliers, we use the method given in Motulsky and Brown (2006). Define the i-th residual of a fitted dose-response model where the mean response f is given in Equation (2) andθ is the estimator obtained by a robust regression model. The method of that paper approximates the distribution of r i by a t-distribution and then adjusts p-values by the false discovery rate (FDR). See Benjamini and Yekutieli (2001) for explanation of FDR. Adapting their algorithm in the context of the 4PL model yields the following.
1. Fit a dose-response model to data using a robust regression model to obtain residuals r i for i = 1, 2, · · · , n. Detailed description of a robust regression model is given in the following paragraphs.
2. Compute the median absolute deviation (MAD) of absolute valued residuals r where med stands for the sample median.
3. Rank the absolute values of the residuals from the smallest to the largest, so that r (a) i:n is the i-th order statistic of the absolute valued residuals. 4. Loop from i = 0.7 × n to n (a) Compute α i = 0.01(n − (i − 1))/n and t i:n = r (a) i:n /σ for i = 1, 2, · · · , n. (b) Compute the two-tailed p-value p i of the statistic t i:n from the t distribution with n − 4 degrees of freedom.
(c) If any of p j is less than or equal to α j , then end the loop and report all the observations for 1 ≤ i ≤ j as outliers.
Contrary to the suggestion of Motulsky and Brown (2006) to use their quantile based measure of scale, where q 0.6827 is the 0.6827-th quantile of absolute valued residuals, we use the median absolute deviation. The reason is that for small sample sizes around 10, the constant factor n/(n − 4) in this equation significantly increases a scale estimate, often resulting in non-rejection of any observations in a data set. For example, the method of Motulsky and Brown (2006) with their estimatorσ did not detect any observation in Error Case 2 in Figure 2 as an outlier. We adopt the constant 0.01 in the definition of α i as recommended by Motulsky and Brown (2006).
For robust regression, we adopt M-estimators. The letter M implies that the M-estimators are obtained from equations that are in similar forms with maximum likelihood estimators. The Mestimators are obtained by minimizing where ρ : R → R + is a differentiable function. See Huber and Ronchetti (2009) for detailed discussion of the M-estimators. Note that the loss function in Equation (3) is a special case of the loss function L ρ when ρ(r) = r 2 . Various forms of the functions ρ for the M-estimation suggested in the literature are as follows, Absolute : ρ 1 (r) = |r|, Tukey's biweight : ρ T (r) = r 6 /A 4 T − 3r 4 /A 2 T + 3r 2 , |r| ≤ A T , where A H = 1.345 was suggested by Huber (1964) and A T = 4.685 was suggested by Mosteller and Tukey (1977). The constant A T was shown in Mosteller and Tukey (1977) to possess 95% efficiency when errors follow a normal distribution, while guaranteeing resistance to contamination of up to 10% of outliers in terms of the breakdown point. These definitions are valid when the standard deviation of the residuals r i is 1, otherwise a robust estimator of the scale of residuals should divide the residuals. Figure 5 shows the shape of the ρ functions corresponding to different M-estimators. The x-axis represents residual values while the y-axis represents the values of the function ρ. Note from the figure that the Huber loss and squared loss coincide with each other for residual values less than A H , but they deviate from each other as the value of the residual r grows larger. The linearity of Huber's loss above A H prevents a regression model from being heavily affected by outliers that have large absolute values of residuals. Tukey's biweight loss can be thought of as being more robust since it gives constant values to all residuals whose absolute values exceed the cutoff A T . This implies that a strong outlier with a large absolute value will have the same effect on a regression fit with an observation with an absolute valued residual equal to A T .
Using M-estimators in nonlinear regression has a few problems. First of all, the conditions for M-estimators to have asymptotic normality are complicated and difficult to be verified as mentioned in Section 12.3 of Seber and Wild (1989). This prevents us from obtaining approximate confidence intervals of true parameters based on M-estimators and their standard errors. On top of that, there are no known general results about consistency of M-estimators in nonlinear regression, which hinders us from trusting fitted parameter estimates by M-estimators. However, the absolute loss function was shown in Oberhofer (1982) to exhibit weak consistency in a nonlinear regression setting. Hence, we set the absolute loss estimator to be a default robust estimation method in our R package when the Hill bounds are attained. Figure 6 shows the results of applying the R package dr4pl to the error cases of Figure 2. Unlike the R packages drc and nplr, the package dr4pl succeeded in fitting a robust regression model to all four data sets. The x-axis represents doses in a log 10 scale and the y-axis represents responses. The black solid line shows a fit obtained by the absolute loss function and red triangles show the outliers detected by the method described in this section. Note from all the plots that the outliers that we pointed out in Section Motivation: convergence failure are marked as outliers. This implies that the method of Motulsky and Brown (2006) is successful in solving the convergence failure problem of the 4PL model.

Remedy for the support problem
The next cause of the convergence failure is the support problem whose possible solution is presented in this section. Final parameter estimates of the 4PL model obtained by an optimization procedure depend on initial estimates supplied to the procedure. The traditional R packages drc and nplr use  (13) denoted by black curves and outliers detected by the method of Motulsky and Brown (2006) denoted by red triangles. Note that the curves are reasonably estimated.
the logistic method which computes the initial parameter estimates as θ (L) 1 = y max ,θ (L) 2 = 10 −β 2 /β 3 ,θ (L) 3 =β 3 ,θ (L) 4 = y min where y min , y max ,β 2 andβ 3 are defined in Subsection Hill bounds: safeguards on the parameter space. As demonstrated in Figure 2, those initial parameter estimates do not always lead the optimization process to convergence. We present possible improvements to the logistic method and show a new initialization method that is effective in handling the support problem with data.

Logistic method combined with a grid search
The paper Wang et al. (2010) presented a grid algorithm for fitting the 4PL model to many data sets generated by a high throughput screening experiment. A main idea of their research is that fitting the Hill equation in Equation (6) with multiple initial values of the upper and lower asymptotes, θ 1 and θ 4 , and starting from the best parameter estimates can yield a better result. That is, we search the grids G 1 and G 4 on θ 1 and θ 4 for the initial estimates θ 1 ,θ 2 ,θ 3 ,θ 4 with the least loss value wherẽ θ 1 ∈ G 1 ,θ 4 ∈ G 4 and the other two estimatesθ 2 andθ 3 are obtained by the Hill equation in Equation (6). The paper suggested naive grids on θ 1 and θ 4 based on constant multiples of y max and y min , but we suggest new grids that statistically make sense.
The basic idea of choosing grids resembles the idea of the Hill bounds in that it uses asymptotic normality of parameter estimators presented in Equation (10). Since we useθ (L) 1 = y max andθ (L) 4 = y min as initial estimators of the upper and lower asymptote parameters in the logistic method, we can obtain approximate standard errors of those estimators from Equation (12). Based on these standard errors, we set the grids on θ 1 and θ 4 as of the multiplicative error model is more complicated than that of the 4PL model as can be seen in the following, The parameters of a multiplicative error model are related to those of a 4PL model by 1/log 10 µ 3 , θ 3 = log 10 µ 3 , θ 4 = 0.
It seems that the 4PL model has been preferred due to these more intuitive meanings of the parameters. This equation shows that the parameter µ 3 is related to the slope of a dose-response curve.
To find estimators of µ 1 , µ 2 , µ 3 in Equations (14) and (15), we consider a weighted least squares loss function For a differentiable function g : R → R we have Hence, we can approximate Var (g (Y i )) by If we let g(y) = log(y), then we have Substituting this into Equation (16) yields Substituting the expression E (Y i ) = f (x i |µ) into this equation yields an approximate linear model To obtain estimates of µ 1 and µ 2 , we take µ 3 ∈ G with the grid G = {q 0.04 , q 0.08 , · · · , q 0.92 , q 0.96 } where q p is the p-th quantile of the Cauchy distribution. The Cauchy distribution was chosen because it can provide a broad range of quantile values due to its heavy tailedness. For each µ 3 ∈ G, the loss function L(µ|x, y) can be minimized over µ 1 and µ 2 by the standard linear regression. That is, a linear regression model with the response 1, the first predictor y i and the second predictor y i µ x i 3 without an intercept can be fitted to obtain parameter estimates. Note that this regression model is fitted for all µ 3 ∈ G and parameter estimatesμ 1 ,μ 2 ,μ 3 with the smallest value of L(·|x, y) are chosen as initial parameter estimates. A nonlinear optimization method starts from these estimates to find final parameter estimates.

Comparison of the logistic and Mead methods
As pointed out in the previous subsection, Mead's method is expected to find better initial parameter estimates than the logistic method when data have the support problem. By simulating data and fitting 4PL models with different initialization techniques, we can see whether the Mead method actually results in final fitted models with lower loss values than the logistic method. We also compare the two methods when data do not exhibit a support problem. By doing that, we can get insight into their relative strengths under various situations.
In this simulation study, we generate data from the models of Subsection Analysis of the Hill bounds whose mean response curves are presented in the left plot of Figure 4 and whose parameter values are given in Table 2. As mentioned in that subsection, Models 1, 2 and 3 represent models without any support problem, while Models 7, 8 and 9 represent severe support problems. Models 4, 5 Parameters θ 3 = −0.5 θ 3 = −1 θ 3 = −1.5 θ 2 = 0.01 442 443 697 θ 2 = 0.1 651 568 682 θ 2 = 1 661 767 843 Table 4: The numbers of times that the Mead method outperforms the logistic method in the sense that it has a lower loss value for final fitted models. The bottom row (θ 2 = 1) represents data with a support problem. It can be seen that the superiority of Mead's method is maximal when data have a support problem. and 6 exhibit mild levels of support problems. For each model, 1,000 data sets are generated and the standard deviation of normally distributed errors was fixed at 5. Table 4 shows the numbers of times that the Mead method outperforms the logistic method in the sense that it has a lower loss value for final fitted models out of 1,000 simulations. The top row of the table represents Models 1, 2 and 3 from left to right. As can be seen from the top row, the Mead method generally performs worse than the logistic method except when the slope is very steep (θ 3 = −1.5). In the middle and bottom rows, the Mead method performs better than the logistic method. The degree of outperformance becomes larger when the support problem is more severe (θ 2 = 1) or the slope parameter is more negative (θ 3 = −1.5). This coincides with our conjecture made in the previous subsection that Mead's method can be particularly useful when data do not sufficiently exist on the right side of the dose-response curves so that estimating the slope and lower asymptote parameters is difficult. In addition, the degree of inferiority of the Mead method to the logistic method when there is no support problem (upper left cell) is not severe, which supports our use of the Mead method as the default initialization method in dr4pl.

Walk-through in R
In this section, we walk through example R code in which dose-response data are analyzed using our R package dr4pl. By following steps shown below, users can understand how to use dr4pl to handle data sets that have the convergence failure issue and what kind of detailed analysis can be done by setting options of R functions in dr4pl.

Main R function dr4pl
We start with Error Case 1 shown in Figures 2 and 6. As can be seen in those figures, this data set contains an extreme outlier at the dose level 10 −5 . If the R package drc is used to fit the 4PL model to this data set, the following error message is returned.
[1] FALSE In this code, some input arguments such as method.init and use.Hessian are specified as different values from the default values of dr4pl. For example, the default value of method.init is Mead but it is specified as logistic in the code. This is to ensure that the function dr4pl works with the same parameter setting as drc. The R package drc uses the logistic method as its default initialization method and the Hessian matrix in its confidence interval estimation. Note that the function dr4pl does not throw an error but returns the member dr4pl.error.1$convergence which indicates that convergence failure happens during the optimization process.
As a possible solution to the convergence failure, the object dr4pl.error.1 provides a member object dr4pl.robust that results from robust estimation and the outlier detection algorithm explained in Subsection Robust regression and outlier detection. We call the object dr4pl.error.1$dr4pl.robust the robust child object of its parent object dr4pl.error. The member idx.outlier of the object dr4pl.robust.1 shows the indices of the outliers in the input data frame drc_error_1 detected by our algorithm. It can be seen that the last sentence in this code generates the same plot as the upper left plot of Figure 6. As mentioned in Subsection Robust regression and outlier detection, the absolute loss is given as the default robust estimation method. This implies that adopting the absolute loss successfully resolves the convergence failure problem of Error Case 1.
Our R package dr4pl provides other real-world data sets than Error Cases 1 -4. We analyze the data set sample_data_5 to compare the two different initialization methods, the logistic and Mead methods, which were presented in Section Remedy for the support problem. > dr4pl.logistic.5 <-dr4pl(Response~Dose, + data = sample_data_5, + method.init = "logistic") > ggplot.logistic <-plot(dr4pl.logistic.5, text.title = "Logistic method") > > dr4pl.Mead.5 <-dr4pl(Response~Dose, + data = sample_data_5) > ggplot. text.title = "Mead s method") > > grid.arrange(ggplot.logistic, ggplot.Mead, nrow = 1, ncol = 2) Running this code generates the two plots in Figure 7. Note that outliers are not denoted by red triangles in these plots since they are generated by the parent objects but not by their robust child objects. It can be seen from the figure that the data set has the support problem in the sense that it is hard to determine an IC50 parameter value visually and data points on the right side of possible IC50 parameter values are not sufficiently provided. Mead's method shown in the right plot seems to yield a closer fit to the right-most data point. This coincides with our explanation in Subsection Mead's method that the method can result in better final parameter estimates particularly when data suffer from the support problem.

Auxiliary R functions
The R function summary of the R package dr4pl shows the 95% asymptotic confidence intervals of the final parameter estimates of a "dr4pl" object. The following code lines show example output of summary applied to the data set sample_data_3. > dr4pl.3 <-dr4pl(Response~Dose, data = sample_data_3) > summary(dr4pl.3) $call  Recall from Subsection 2.2.1 that the IC50 is the dose level whose response value is halfway between the upper and lower asymptotes θ 1 and θ 4 . Similarly, other inhibitory concentrations of interest can be defined according to different proportions between θ 1 and θ 4 as ι p which satisfies the following equation, f ι p = pθ 1 + (1 − p)θ 4 where 0 ≤ p ≤ 1. It can be easily seen that ι 0.5 corresponds to the IC50 parameter θ 2 . Like the R function ED of drc and the R function getEstimates of nplr, the R package dr4pl implements the function IC to compute inhibitory concentration parameters ι p for 0 ≤ p ≤ 1. The following code shows a simple use of IC to compute ι 0.1 , ι 0.3 , ι 0.5 , ι 0.7 , ι 0.9 for the data sample_data_3.

Summary and future extension
The 4PL model has often suffered from convergence failure problems preventing experimenters from obtaining fitted 4PL models to data sets. Without diagnosing problems with data sets, experimenters should often try many different initial parameter estimates, which still does not guarantee stable convergence of optimization processes. In this paper, we presented a novel method based on studying convergence failure problems. By imposing proper conditions on the parameter space of the 4PL models, our new R package dr4pl reports convergence failure during an optimization process and presents possible resolutions to users.
Another important issue can be memory usage and speed of dr4pl, relative to other R packages nplr and drc. Measuring time and memory consumed by the three R packages for the 13 data sets provided in dr4pl showed that nplr is much slower, and used much more memory, than either of the other two methods. The enhanced features of dr4pl resulted in its being only slightly slower, with somewhat more memory usage than drc.
Asymptotic confidence intervals of M-estimators have been developed in Lim et al. (2013). An interesting open problem is the numerical implementation of these in the 4PL context. Another area of potential future extension is modelling interaction between different drugs. The R packages drc and MixLow support functionality for drug interaction modelling. While we are not aware of data sets reported in the literature that cause the convergence failure problem when there are more than one drug, it seems that outliers or the support problem may cause instability of optimization processes in those cases as well. Analysis of the Hill bounds can be a remedy for those, which is an important area of future research.