Differential Evolution with DEoptim

David Ardia; Kris Boudt; Peter Carl; Katharine M. Mullen; Brian G. Peterson

doi:10.32614/RJ-2011-005

1 Introduction

Differential Evolution (DE) is a search heuristic introduced by Storn and K. Price (1997). Its remarkable performance as a global optimization algorithm on continuous numerical minimization problems has been extensively explored (Price, R. M. Storn, and J. A. Lampinen 2006). DE has also become a powerful tool for solving optimization problems that arise in financial applications: for fitting sophisticated models (Gilli and E. Schumann 2009), for performing model selection (Maringer and M. Meyer 2008), and for optimizing portfolios under non-convex settings (Krink and S. Paterlini 2011). DE is available in R with the package DEoptim.

In what follows, we briefly sketch the DE algorithm and discuss the content of the package DEoptim. The utility of the package for financial applications is then explored by solving a non-convex portfolio optimization problem.

2 Differential Evolution

DE belongs to the class of genetic algorithms (GAs) which use biology-inspired operations of crossover, mutation, and selection on a population in order to minimize an objective function over the course of successive generations (Holland 1975). As with other evolutionary algorithms, DE solves optimization problems by evolving a population of candidate solutions using alteration and selection operators. DE uses floating-point instead of bit-string encoding of population members, and arithmetic operations instead of logical operations in mutation, in contrast to classic GAs.

Let \(\mathit{NP}\) denote the number of parameter vectors (members) \(\mathbf{x} \in \mathbb{R}^d\) in the population, where \(d\) denotes dimension. In order to create the initial generation, \(\mathit{NP}\) guesses for the optimal value of the parameter vector are made, either using random values between upper and lower bounds (defined by the user) or using values given by the user. Each generation involves creation of a new population from the current population members \(\{ \mathbf{x}_i \,|\, i = 1, \ldots, \mathit{NP}\}\), where \(i\) indexes the vectors that make up the population. This is accomplished using differential mutation of the population members. An initial mutant parameter vector \(\mathbf{v}_i\) is created by choosing three members of the population, \(\mathbf{x}_{i_1}\), \(\mathbf{x}_{i_2}\) and \(\mathbf{x}_{i_3}\), at random. Then \(\mathbf{v}_i\) is generated as

\[\mathbf{v}_i \doteq \mathbf{x}_{i_1} + F \cdot (\mathbf{x}_{i_2} - \mathbf{x}_{i_3}) \,,\] where \(F\) is a positive scale factor, effective values for which are typically less than one. After the first mutation operation, mutation is continued until \(d\) mutations have been made, with a crossover probability \(\mathit{CR} \in [0,1]\). The crossover probability \(\mathit{CR}\) controls the fraction of the parameter values that are copied from the mutant. Mutation is applied in this way to each member of the population. If an element of the trial parameter vector is found to violate the bounds after mutation and crossover, it is reset in such a way that the bounds are respected (with the specific protocol depending on the implementation). Then, the objective function values associated with the children are determined. If a trial vector has equal or lower objective function value than the previous vector it replaces the previous vector in the population; otherwise the previous vector remains. Variations of this scheme have also been proposed; see Price, R. M. Storn, and J. A. Lampinen (2006).

Intuitively, the effect of the scheme is that the shape of the distribution of the population in the search space is converging with respect to size and direction towards areas with high fitness. The closer the population gets to the global optimum, the more the distribution will shrink and therefore reinforce the generation of smaller difference vectors.

For more details on the DE strategy, we refer the reader to Price, R. M. Storn, and J. A. Lampinen (2006) and Storn and K. Price (1997).

2.1 The package DEoptim

DEoptim (Ardia, K. M. Mullen, B. G. Peterson and J. Ulrich 2011) was first published on CRAN in 2005. Since becoming publicly available, it has been used by several authors to solve optimization problems arising in diverse domains. We refer the reader to Mullen, D. Ardia, D. L. Gil, D. Windover, and J. Cline (2011) for a detailed description of the package.

DEoptim consists of the core function DEoptim whose arguments are:

fn: the function to be optimized (minimized).
lower, upper: two vectors specifying scalar real lower and upper bounds on each parameter to be optimized. The implementation searches between lower and upper for the global optimum of fn.
control: a list of tuning parameters, among which: NP (default: \(10 \cdot d\)), the number of population members and itermax (default: 200), the maximum number of iterations (i.e., population generations) allowed. For details on the other control parameters, the reader is referred to the documentation manual (by typing ?DEoptim). For convenience, the function DEoptim.control() returns a list with default elements of control.
...: allows the user to pass additional arguments to the function fn.

The output of the function DEoptim is a member of the S3 class DEoptim. Members of this class have a plot and a summary method that allow to analyze the optimizer’s output.

Let us quickly illustrate the package’s usage with the minimization of the Rastrigin function in \(\mathbb{R}^2\), which is a common test for global optimization:

> Rastrigin <- function(x) \{
+   sum(x^2 - 10 * cos(2 * pi * x)) + 20
\}

The global minimum is zero at point \(\mathbf{x} = (0,0)'\). A perspective plot of the function is shown in Figure 1.

graphic without alt text — Figure 1: Perspective plot of the Rastrigin function.

The function DEoptim searches for a minimum of the objective function between lower and upper bounds. A call to DEoptim can be made as follows:

> set.seed(1234)
> DEoptim(fn = Rastrigin,
+   lower = c(-5, -5),
+   upper = c(5, 5),
+   control = list(storepopfrom = 1))

The above call specifies the objective function to minimize, Rastrigin, the lower and upper bounds on the parameters, and, via the control argument, that we want to store intermediate populations from the first generation onwards (storepopfrom = 1). Storing intermediate populations allows us to examine the progress of the optimization in detail. Upon initialization, the population is comprised of 50 random values drawn uniformly within the lower and upper bounds. The members of the population generated by the above call are plotted at the end of different generations in Figure 2. DEoptim consistently finds the minimum of the function (market with an open white circle) within 200 generations using the default settings. We have observed that DEoptim solves the Rastrigin problem more efficiently than the simulated annealing method available in the R function optim (for all annealing schedules tried). Note that users interested in exact reproduction of results should set the seed of their random number generator before calling DEoptim (using set.seed). DE is a randomized algorithm, and the results may vary between runs.

Finally, note that DEoptim relies on repeated evaluation of the objective function in order to move the population toward a global minimum. Therefore, users interested in making DEoptim run as fast as possible should ensure that evaluation of the objective function is as efficient as possible. Using pure R code, this may often be accomplished using vectorization. Writing parts of the objective function in a lower-level language like C or Fortran may also increase speed.

3 Risk allocation portfolios

Mean-risk models were developed in early fifties for the portfolio selection problem. Initially, variance was used as a risk measure. Since then, many alternative risk measures have been proposed. The question of which risk measure is most appropriate is still the subject of much debate. Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR) are the most popular measures of downside risk. VaR is the negative value of the portfolio return such that lower returns will only occur with at most a preset probability level, which typically is between one and five percent. CVaR is the negative value of the mean of all return realizations that are below the VaR. There is a voluminous literature on portfolio optimization problems with VaR and CVaR risk measures, see, e.g., Fábián and A. Veszprémi (2008) and the references therein.

Modern portfolio selection considers additional criteria such as to maximize upper-tail skewness and liquidity or minimize the number of securities in the portfolio. In the recent portfolio literature, it has been advocated by various authors to incorporate risk contributions in the portfolio allocation problem. Qian’s (2005) Risk Parity Portfolio allocates portfolio variance equally across the portfolio components. Maillard, T. Roncalli, and J. Teiletche (2010) call this the Equally-Weighted Risk Contribution (ERC) Portfolio. They derive the theoretical properties of the ERC portfolio and show that its volatility is located between those of the minimum variance and equal-weight portfolio. Zhu, D. Li, and X. Sun (2010) study optimal mean-variance portfolio selection under a direct constraint on the contributions to portfolio variance. Because the resulting optimization model is a non-convex quadratically constrained quadratic programming problem, they develop a branch-and-bound algorithm to solve it.

Boudt, P. Carl, and B. G. Peterson (2010a) propose to use the contributions to portfolio CVaR as an input in the portfolio optimization problem to create portfolios whose percentage CVaR contributions are aligned with the desired level of CVaR risk diversification. Under the assumption of normality, the percentage CVaR contribution of asset \(i\) is given by the following explicit function of the vector of weights \(\mathbf{w} \doteq (w_1,\ldots,w_d)'\), mean vector \(\mathbf{\mu} \doteq (\mu_1, \ldots, \mu_d)'\) and covariance matrix \(\mathbf{\Sigma}\):

\[\frac{w_i \left[-\mu_i + \frac{(\mathbf{\Sigma} \mathbf{w})_{i}}{\sqrt{\mathbf{w}' \mathbf{\Sigma} \mathbf{w}}}\frac{\phi(z_\alpha)} {\alpha}\right]}{-\mathbf{w}' \mathbf{\mu} + \sqrt{\mathbf{w}' \mathbf{\Sigma} \mathbf{w}} \frac{\phi(z_\alpha)}{\alpha}} \,,\] with \(z_\alpha\) the \(\alpha\)-quantile of the standard normal distribution and \(\phi(\cdot)\) the standard normal density function. Throughout the paper we set \(\alpha=5\%\). As we show here, the package DEoptim is well suited to solve these problems. Note that it is also the evolutionary optimization strategy used in the package PortfolioAnalytics (Boudt, P. Carl, and B. G. Peterson 2010b).

To illustrate this, consider as a stylized example a five-asset portfolio invested in the stocks with tickers GE, IBM, JPM, MSFT and WMT. We keep the dimension of the problem low in order to allow interested users to reproduce the results using a personal computer. More realistic portfolio applications can be obtained in a straightforward manner from the code below, by expanding the number of parameters optimized. Interested readers can also see the DEoptim package vignette for a 100-parameter portfolio optimization problem that is typical of those encountered in practice.

We first download ten years of monthly data using the function getSymbols of the package quantmod (Ryan 2010). Then we compute the log-return series and the mean and covariance matrix estimators. For on overview of alternative estimators of the covariance matrix, such as outlier robust or shrinkage estimators, and their implementation in portfolio allocation, we refer the reader to Würtz, Y. Chalabi, W. Chen and A. Ellis (2009 4). These estimators might yield better performance in the case of small samples, outliers or departures from normality.

> library("quantmod")
> tickers <- c("GE", "IBM", "JPM", "MSFT", "WMT")
> getSymbols(tickers, 
+   from = "2000-12-01", 
+   to   = "2010-12-31")
> P <- NULL
> for(ticker in tickers) \{
+   tmp <- Cl(to.monthly(eval(parse(text = ticker))))
+   P <- cbind(P, tmp)
+ \}
> colnames(P) <- tickers
> R <- diff(log(P))
> R <- R[-1,]
> mu <- colMeans(R)
> sigma <- cov(R)

We first compute the equal-weight portfolio. This is the portfolio with the highest weight diversification and often used as a benchmark. But is the risk exposure of this portfolio effectively well diversified across the different assets? This question can be answered by computing the percentage CVaR contributions with function ES in the package PerformanceAnalytics (Carl and B. G. Peterson 2011). These percentage CVaR contributions indicate how much each asset contributes to the total portfolio CVaR.

> library("PerformanceAnalytics")
> pContribCVaR <- ES(weights = rep(0.2, 5),
+   method = "gaussian",
+   portfolio_method = "component",
+   mu = mu,
+   sigma = sigma)\$pct_contrib_ES
> rbind(tickers, round(100 * pContribCVaR, 2))
        [,1]    [,2]    [,3]    [,4]    [,5]
tickers "GE"    "IBM"   "JPM"   "MSFT"  "WMT"
        "21.61" "18.6" "25.1" "25.39" "9.3"

We see that in the equal-weight portfolio, 25.39% of the portfolio CVaR risk is caused by the 20% investment in MSFT, while the 20% investment in WMT only causes 9.3% of total portfolio CVaR. The high risk contribution of MSFT is due to its high standard deviation and low average return (reported in percent):

> round(100 * mu , 2)
   GE   IBM   JPM  MSFT   WMT
-0.80  0.46 -0.06 -0.37  0.0
> round(100 * diag(sigma)^(1/2), 2)
   GE   IBM   JPM  MSFT   WMT
8.90  7.95  9.65 10.47  5.33

We now use the function DEoptim of the package DEoptim to find the portfolio weights for which the portfolio has the lowest CVaR and each investment can contribute at most 22.5% to total portfolio CVaR risk. For this, we first define our objective function to minimize. The current implementation of DEoptim allows for constraints on the domain space. To include the risk budget constraints, we add them to the objective function through a penalty function. As such, we allow the search algorithm to consider infeasible solutions. A portfolio which is unacceptable for the investor must be penalized enough to be rejected by the minimization process and the larger the violation of the constraint, the larger the increase in the value of the objective function. A standard expression of these violations is \(\alpha \times | \mbox{violation} |^p\). Crama and M. Schyns (2003) describe several ways to calibrate the scaling factors \(\alpha\) and \(p\). If these values are too small, then the penalties do not play their expected role and the final solution may be infeasible. On the other hand, if \(\alpha\) and \(p\) are too large, then the CVaR term becomes negligible with respect to the penalty; thus, small variations of \(\mathbf{w}\) can lead to large variations of the penalty term, which mask the effect on the portfolio CVaR. We set \(\alpha = 10^3\) and \(p=1\), but recognize that better choices may be possible and depend on the problem at hand.

> obj <- function(w) \{
+   if (sum(w) == 0) \{ w <- w + 1e-2 \}
+   w <- w / sum(w)
+   CVaR <- ES(weights = w,
+     method = "gaussian",
+     portfolio_method = "component",
+     mu = mu,
+     sigma = sigma)
+   tmp1 <- CVaR\$ES
+   tmp2 <- max(CVaR\$pct_contrib_ES - 0.225, 0)
+   out <- tmp1 + 1e3 * tmp2
+ \}

The penalty introduced in the objective function is non-differentiable and therefore standard gradient-based optimization routines cannot be used. In contrast, DEoptim is designed to consistently find a good approximation to the global minimum of the optimization problem:

> set.seed(1234)
> out <- DEoptim(fn = obj,
+   lower = rep(0, 5),
+   upper = rep(1, 5))
> out\$optim\$bestval
[1] 0.1143538
> wstar <- out\$optim\$bestmem
> wstar <- wstar / sum(wstar)
> rbind(tickers, round(100 * wstar, 2))
        par1    par2    par3    par4    par5
tickers "GE"    "IBM"   "JPM"   "MSFT"  "WMT"
        "18.53" "21.19" "11.61" "13.37" "35.3"
> 100 * (sum(wstar * mu) - mean(mu))
[1] 0.04827935

Note that the main differences with the equal-weight portfolio is the low weights given to JPM and MSFT and the high weight to WMT. As can be seen from the last two lines, this minimum risk portfolio has a higher expected return than the equal weight portfolio. The following code illustrates that DEoptim yields superior results than the gradient-based optimization routines available in R.

> out <- optim(par = rep(0.2, 5), 
+  fn = obj,
+  method = "L-BFGS-B",
+  lower = rep(0, 5),
+  upper = rep(1, 5))
> out\$value
[1] 0.1255093
> out <- nlminb(start = rep(0.2, 5), 
+   objective = obj,
+   lower = rep(0, 5),
+   upper = rep(1, 5))
> out\$objective
[1] 0.1158250

Even on this relatively simple stylized example, the optim and nlminb routines converged to local minima.

Suppose now the investor is interested in the most risk diversified portfolio whose expected return is higher than the equal-weight portfolio. This amounts to minimizing the largest CVaR contribution subject to a return target and can be implemented as follows:

> obj <- function(w) \{
+   if(sum(w) == 0) \{ w <- w + 1e-2 \}
+   w <- w / sum(w)
+   contribCVaR <- ES(weights = w,
+     method = "gaussian",
+     portfolio_method = "component",
+     mu = mu,
+     sigma = sigma)\$contribution
+   tmp1 <- max(contribCVaR)
+   tmp2 <- max(mean(mu) - sum(w * mu), 0)
+   out <- tmp1 + 1e3 * tmp2
+ \}
> set.seed(1234)
> out <- DEoptim(fn = obj,
+   lower = rep(0, 5),
+   upper = rep(1, 5))
> wstar <- out\$optim\$bestmem
> wstar <- wstar / sum(wstar)
> rbind(tickers, round(100 * wstar, 2))
        par1    par2    par3    par4    par5
tickers "GE"    "IBM"   "JPM"   "MSFT"  "WMT"
        "17.38" "19.61" "14.85" "15.19" "32.98"
> 100 * (sum(wstar * mu) - mean(mu))
[1] 0.04150506

This portfolio invests more in the JPM stock and less in the GE (which has the lowest average return) compared to the portfolio with the upper 22.5% percentage CVaR constraint. We refer to Boudt, P. Carl, and B. G. Peterson (2010a) for a more elaborate study on using CVaR allocations as an objective function or constraint in the portfolio optimization problem.

A classic risk/return (i.e., CVaR/mean) scatter chart showing the results for portfolios tested by DEoptim is displayed in Figure 3. Gray elements depict the results for all tested portfolios (using hexagon binning which is a form of bivariate histogram, see Carr, N. Lewin-Koh and M. Maechler (2011); higher density regions are darker). The yellow-red line shows the path of the best member of the population over time, with the darkest solution at the end being the optimal portfolio. We can notice how DEoptim does not spend much time computing solutions in the scatter space that are suboptimal, but concentrates the bulk of the calculation time in the vicinity of the final best portfolio. Note that we are not looking for the tangency portfolio; simple mean/CVaR optimization can be achieved with standard optimizers. We are looking here for the best balance between return and risk concentration.

We have utilized the mean return/CVaR concentration examples here as more realistic, but still stylized, examples of non-convex objectives and constraints in portfolio optimization; other non-convex objectives, such as drawdown minimization, are also common in real portfolios, and are likewise suitable to application of Differential Evolution. One of the key issues in practice with real portfolios is that a portfolio manager rarely has only a single objective or only a few simple objectives combined. For many combinations of objectives, there is no unique global optimum, and the constraints and objectives formed lead to a non-convex search space. It may take several hours on very fast machines to get the best answers, and the best answers may not be a true global optimum, they are just as close as feasible given potentially competing and contradictory objectives.

When the constraints and objectives are relatively simple, and may be reduced to quadratic, linear, or conical forms, a simpler optimization solver will produce answers more quickly. When the objectives are more layered, complex, and potentially contradictory, as those in real portfolios tend to be, DEoptim or other global optimization algorithms such as those integrated into PortfolioAnalytics provide a portfolio manager with a feasible option for optimizing their portfolio under real-world non-convex constraints and objectives.

The PortfolioAnalytics framework allows any arbitrary R function to be part of the objective set, and allows the user to set the relative weighting that they want on any specific objective, and use the appropriately tuned optimization solver algorithm to locate portfolios that most closely match those objectives.

4 Summary

In this note we have introduced DE and DEoptim. The package DEoptim provides a means of applying the DE algorithm in the R language and environment for statistical computing. DE and the package DEoptim have proven themselves to be powerful tools for the solution of global optimization problems in a wide variety of fields. We have referred interested users to (Price, R. M. Storn, and J. A. Lampinen 2006) and Mullen, D. Ardia, D. L. Gil, D. Windover, and J. Cline (2011) for a more extensive introduction, and further pointers to the literature on DE. The utility of using DEoptim was further demonstrated with a simple example of a stylized non-convex portfolio risk contribution allocation, with users referred to PortfolioAnalytics for portfolio optimization using DE with real portfolios under non-convex constraints and objectives.

The latest version of the package includes the adaptive Differential Evolution framework by Zhang and A. C. Sanderson (2009). Future work will also be directed at parallelization of the implementation. The DEoptim project is hosted on R-forge at https://r-forge.r-project.org/projects/deoptim/.

5 Acknowledgements

The authors would like to thank Dirk Eddelbuettel, Juan Ospina, and Enrico Schumann for useful comments and Rainer Storn for his advocacy of DE and making his code publicly available. The authors are also grateful to two anonymous reviewers for helpful comments that have led to improvements of this note. Kris Boudt gratefully acknowledges financial support from the National Bank of Belgium.

6 Disclaimer

The views expressed in this note are the sole responsibility of the authors and do not necessarily reflect those of aeris CAPITAL AG, Guidance Capital Management, Cheiron Trading, or any of their affiliates. Any remaining errors or shortcomings are the authors’ responsibility.

Differential Evolution with DEoptim

1 Introduction

2 Differential Evolution

2.1 The package DEoptim

3 Risk allocation portfolios

4 Summary

5 Acknowledgements

6 Disclaimer

6.1 CRAN packages used

6.2 CRAN Task Views implied by cited packages

References

Reuse

Citation