LP Algorithms for Portfolio Optimization : The PortfolioOptim Package

The paper describes two algorithms for financial portfolio optimization with the following risk measures: CVaR, MAD, LSAD and dispersion CVaR. These algorithms can be applied to discrete distributions of asset returns since then the optimization problems can be reduced to linear programs. The first algorithm solves a simple recourse problem as described by Haneveld using Benders decomposition method. The second algorithm finds an optimal portfolio with the smallest distance to a given benchmark portfolio and is an adaptation of the least norm solution (called also normal solution) of linear programs due to Zhao and Li. The algorithms are implemented in R in the package PortfolioOptim.


Introduction
The construction of an optimal portfolio of financial instruments is one of primal goals in asset management.Recent advances in risk measurement advocate using risk measures that take into account the tail behavior of asset return distributions, such as conditional value-at-risk or lower semi-absolute deviation.For these risk measures, finding optimal portfolios in risk-return setting leads to convex programming problems.For a finite discrete distribution of returns and many practically used risk measures the optimization problem can be solved by LP methods.The portfolio optimization by LP methods leads to two problems: for distributions with a large number of values (for example, a large sample from a continuous distribution) one obtains a unique solution but the LP problem is resource demanding; for distributions with a small number of values one often obtains a non-unique solution, which is of limited use for asset management.In the latter case, a unique optimal portfolio is obtained by projecting a benchmark portfolio on the whole set of solutions.In the paper we present an algorithm that solves efficiently the problem with large distributions and an algorithm that finds an orthogonal projection of the benchmark portfolio on the space of solutions.
The R language and environment for statistical computing offer a large variety of tools for portfolio optimization.General purpose optimization tools are reviewed by Theussl and Borchers (2016) (R packages for solving optimization problems) and Koenker and Mizera (2014) (R packages for convex optimization).The book by Pfaff (2016) provides an overview of specific functions for portfolio optimization which are embedded in financial packages.Here we mention a selection of R-packages dedicated primarily to portfolio optimization.The package fPortfolio by Würtz et al. (2009) offers a large set of functions for financial data analysis and enables portfolio optimization in mean-variance, mean-MAD and mean-CVaR settings.For these portfolio problems the package employs existing optimization tools: LP, QP and NLP solvers.The package PortfolioAnalytics by Peterson and Carl (2015) uses standard linear and quadratic optimization tools (Rglpk and quadprog) and a number of new packages: stochastic optimization (DEoptim, GenSA) and particle swarm optimization (psoptim).The package parma by Ghalanos (2016) offers scenario and moment based optimization of portfolios for a large class of risk and deviation measures using Rglpk, quadprog and, for non-linear optimization nloptr.The above mentioned packages are very effective in solving medium-size portfolio problems, however, due to their use of standard optimization tools they cannot deal with large problems either in the number of assets or the size of the distribution.Moreover, none of these packages is able to select an optimal portfolio which is closest to a given benchmark portfolio.
The new package PortfolioOptim overcomes the aforementioned limitations solving portfolio problems in mean-risk setting with linear portfolio constraints and risk measures that make the problem reducible to a linear program.Attempts to apply LP solvers to more general portfolio problems (cf.Mansini et al. (2014) and references therein) are not included in the package.Our first contribution is an efficient search algorithm for optimal portfolios in stochastic programs with a very large number of scenarios.A large number of scenarios in portfolio optimization appears often when a continuous distribution of returns is approximated by a discrete one.The goal is to obtain an optimal portfolio which approximates the optimal portfolio for the continuous distribution.This can be achieved by performing optimization on a large discrete sample generated from the continuous distribution under consideration.A large sample leads to a high dimensional LP problem which can be difficult to solve by general-purpose LP solvers.This problem has been addressed by Künzi-Bay and Mayer (2006) who solved the portfolio optimization problem in mean-CVaR setting using Benders decomposition.They have computed accurately CVaR of the optimal portfolio using samples of order 10 4 .Our extension of this result is twofold.First, we design an algorithm that implements Benders decomposition for a general class of simple recourse problems.Portfolio optimization problems for The R Journal Vol.10/1, July 2018 ISSN 2073-4859 the risk measures mentioned above are special cases of these simple recourse problems.Second, using an internal point LP solver from GLPK library and appropriately modifying the stopping criterion we substantially increase the size of discrete samples which can be used in computations.Our algorithm can perform computations for samples of order 10 6 on a standard computer in a few seconds.
Our second contribution relates to portfolio optimization problems when only a small number of random scenarios is available.Small discrete samples in portfolio optimization appear when the distribution of returns is an empirical distribution of historical returns.Usually one takes 5 or 10 years of weekly or monthly returns giving between 120 and 500 samples.Solutions to linear programs of such dimensions are in many cases non-unique and can occupy a whole face of a multidimensional simplex.In such cases, a standard software finds only one solution, usually an vertex of the solution simplex.This is often not the optimal portfolio which asset managers consider as the most appropriate.Asset managers have usually certain beliefs about the composition of optimal portfolios and those can be expressed as similarity or vicinity of the desired optimal portfolio to some benchmark portfolio.Therefore, one wants to find an optimal portfolio which has the smallest distance to a given benchmark, or, equivalently, the orthogonal projection of the benchmark on the space of optimal portfolios.
In the LP literature a special case of the above projection problem has been discussed for a long time.This is the problem of the least norm LP solution, also called normal solution (cf.Zhao and Li (2002) and references cited therein).Unfortunately, the algorithms which find normal solutions cannot be easily adapted to our problem.The reason is that portfolio weights make up only a fraction of all coordinates of the LP solution (see the examples in Sections LP computable portfolio problems and Benders decomposition).Contrary to the normal solution which is the projection of the origin onto the simplex of optimal solutions, we are looking for a projection onto a subspace of portfolio weights.In addition, we are not looking for a solution with the least norm but for a solution with the smallest distance to an arbitrary vector (benchmark portfolio).It appears however, that the regularized central path algorithm due to Zhao and Li (2002) which solves simultaneously primal and dual LP problems finding the least norm solutions to both problems, can be modified to our purposes.Our contribution is the extension of this algorithm to the following problem.Given the set S * of optimal solutions to an LP problem, find x * ∈ S * such that B(x * − x) ≤ B(x − x) for all x ∈ S * , where x is a given vector and B is the operator of projection on a subspace.This general algorithm is then adapted to obtain an optimal portfolio with the smallest distance to a benchmark (see Section Projection algorithm for portfolio optimization for details).
The rest of the paper is organized as follows.In Section LP computable portfolio problems, we describe the portfolio optimization problems that can be reduced to LP problems and are considered as working examples in the rest of the paper.These portfolio optimization problems are also implemented in the package PortfolioOptim.Section Benders decomposition analyzes Benders decomposition algorithm applied to simple recourse problems.As an illustration, we present applications to the portfolio optimization in mean-CVaR and mean-LSAD settings.Section Projection algorithm for portfolio optimization describes the adaptation of the path-following algorithm of Zhao and Li (2002) to the construction of an optimal portfolio with the smallest distance to a benchmark portfolio.We present also the proof of convergence for the modified algorithm.Computational examples and the analysis of performance of both algorithms are presented in Section Numerical examples.The paper ends with short Summary.

LP computable portfolio problems
We consider the portfolio optimization problem in mean-risk setting.The risk is measured by a risk (or deviation) measure R (for the definitions of risk and deviation measures the reader is advised to consult the paper by Rockafellar et al. (2006)).Let R be a distribution of returns of J risky assets.We denote by R centered returns, i.e.R = R − E[R].Consider the problem of finding an optimal portfolio u which fulfills the conditions where X is a polyhedral set (i.e.given by linear constraints) and r 0 is the required return of the portfolio.The set X is usually the whole space (when no limitations on trading positions are imposed) or the positive orthant (when short-selling of assets is prohibited).
We will assume that at optimum the constraint u T E[R] ≥ r 0 is binding.This is the case when r 0 is not smaller than the return for the minimal risk portfolio, i.e. the portfolio which solves the problem min u∈X R(u T R).
It appears that for a number of risk measures the optimization problem (1) formulated for a discrete distribution R can be reduced to a linear program.Let the asset returns R be given by a discrete The R Journal Vol.10/1, July 2018 ISSN 2073-4859 distribution placing weights p n at points r n , n = 1, . . ., N. When the distribution is centered points of R will be denoted by rn .
Consider the risk measured by conditional value at risk (CVaR).For a random outcome X we define VaR α (X) as VaR α (X) = − inf {z : P (X ≤ z) > α} .
Then the conditional value at risk is defined as Rockafellar and Uryasev (2000) observed that for X = u T R the minimization of CVaR can be formulated as the following nonlinear problem min where the latent variable ξ corresponds to VaR α (u T R).For discrete distributions and the risk measured by CVaR, the portfolio optimization problem can therefore be reformulated as the linear program (3) Here µ = E[R], while y n plays a role of the shortfall of return u T r n below ξ: For deviation CVaR we replace r n by rn in (3).
Consider now mean-MAD and mean-LSAD optimization problems.MAD (mean absolute deviation) is defined as (cf.Konno and Yamazaki (1991)) For discrete distributions mean-MAD optimization problem reads (5) This problem can be formulated as the linear program For LSAD (lower semi absolute deviation) as defined by Konno (1990) or Konno et al. (2002) mean-LSAD optimization has the form The R Journal Vol.10/1, July 2018ISSN 2073-4859 This leads to the following linear program ), in what follows we shall consider only one of the above optimization problems.

Algorithm
In this section, we present a solution to a simple linear recourse problem with random technology matrix when the distribution of stochastic variable is represented by a large number of scenarios.The simple recourse problem is a special case of two-stage recourse problem with the following form of the recourse subproblem In this framework, q + and q − are known penalty costs, matrix A = A(ω) and vector b = b(ω) are random and X is a bounded, convex subset in R l .
We see that each pair of recourse variables (y + i , y − i ), i = 1, . . ., m , depends only on the i-th row in the condition b − Ax, so that their optimal values can be determined independently.Thus, the second-stage value function v(x) is separated into the sum of m functions where A i is the ith row of matrix A and b i is the ith element of vector b.
For each optimization problem min the dual problem has the form sup The dual problem is feasible only if q + i + q − i ≥ 0. When this condition holds, the solution to the dual problem is given by x ≤ 0, and the optimal solution to problem (12) has the form Then the value function Q(x) can be written as The R Journal Vol.10/1, July 2018 ISSN 2073-4859 Since s + − s − = s, we have where Āi = For a discrete probability space with P(ω = ω n ) = p n , n = 1, . . ., N, the simple recourse problem can be reformulated in the following way This linear program with a large size N of the probability space is difficult to solve due to the number of constraints.Klein Haneveld and Van der Vlerk (2006) solved it using a special version of the L-shaped method.Their approach was further extended by Künzi-Bay and Mayer (2006) who applied directly Benders decomposition (cf., Benders (1962)).In terms of Benders cuts ( 15) is equivalent to (we skip (q − ) T b which is constant) where M = {1, . . ., N}.
This problem is even harder than the original linear program (15) as there are 2 N constraints, but Benders (1962) showed that the above problem can be solved through a sequence of expanding relaxed problems ν is the number of steps, K k ⊂ M are constraints added in step k with K k = K l when k = l.Hence, each successive relaxed problem adds more constraints to those already present in the previous steps. Let Then ( 17) is written as the linear program The complete algorithm is as follows: Step 1. Initialization: set Step 2. Solve problem (18) and denote the solution by (x * , w * ).Put Step 3. If (F − F) ≤ ε then stop.x * is an optimal solution.
Add this new constraint to the set of constraints and go to Step 2.
The algorithm written as an R script can be seen on Fig. 1.
Solve the linear problem min (x,w) cmat T * (x, w) with the constraints Amat * (x, w) T ≤ bmat T and denote its solution by (x * , w * ).Put

Conditional value at risk (CVaR)
We begin with an example of mean-CVaR optimization for a discrete distribution of returns.Our goal is to find portfolio u which solves As is shown in Section LP computable portfolio problems, this problem is reduced to the linear program The R Journal Vol.10/1, July 2018 ISSN 2073-4859 Benders decomposition for the above problem leads to the sequence of expanding relaxed problems Using notation from Section Benders decomposition we identify x = (u, ξ), q − = 0, (q + ) T A(ω n ) = (r n , 1), b(ω n ) = 0 and c = (0 J , 1), where 0 J is the zero vector of length J.The algorithm can be readily applied.

Mean absolute deviation(MAD) and lower semi-absolute deviation (LSAD)
Due to the relation LSAD(u T R) = 1 2 MAD(u T R) it is sufficient to consider only the mean-LSAD portfolio optimization: As before, for a discrete set of returns that problem can be formulated as a linear program (see equation ( 9)) and Benders decomposition can be applied leading to a sequence of expanding relaxed problems.
Introducing the new variable we reduce the problem to the linear program Similarly as for CVaR, we recognize in that formulation the Benders problem from Section Benders decomposition.

Projection algorithm for portfolio optimization Algorithm
As we have discussed earlier, some mean-risk optimization problems can be transformed into linear programs.Those linear programs can be written down in the standard form In practical computations, it appears that solutions to the above linear program are sometimes non-unique and form a (multidimensional) simplex.A typical linear solver find only one of the solutions -usually one of the vertices of the solution simplex.In this section our aim is to find a solution to the above linear program which is closest to a given vector.It often lies in the interior of a face of the solution simplex.
Let S * denote the set of optimal solutions to (24), B be a given invertible matrix in R n and x a vector from R n .We want to find x * ∈ S * such that The least norm solution to linear program mentioned in the Introduction is a special case of problem (24-25).It corresponds to x = 0 and B being an identity matrix.Finding an optimal portfolio closest to a given benchmark in the framework of LP problems can also be reduced, after some modification, to the solution of problem (24-25).To describe this modification observe that for LP computable portfolio problems the independent variable x in (24) contains more coordinates that only The R Journal Vol.10/1, July 2018 ISSN 2073-4859 portfolio weights.Let x = (x , u) where u corresponds to portfolio weights and x contains all other coordinates (for CVaR optimization (3) x = (ξ, y, u), hence x = (ξ, y), for MAD optimization (6) and LSAD optimization (9) x = (y, u)).Given the benchmark portfolio û, the objective is to find an optimal portfolio which minimizes the distance u − û .To achieve that goal by solving problem (24-25) we have to extend û to a vector x = ( x , û) on the whole space R n taking arbitrary x and projecting the difference (x − x) on the subspace spanned by the portfolio coordinates.Let B * be this projection operator, then the aim is to minimize B * (x − x) .Matrix B * is diagonal with entries 1 on the diagonal positions corresponding to portfolio weights and 0 otherwise.Since B * is not invertible the considered portfolio problem cannot be reduced to the solution of problem (24-25).Hence we introduce a matrix B replacing in B * zero diagonal entries with some small positive number ε.Then B is invertible and an optimal portfolio close to a benchmark can be found by solving problem (24-25).One can further improve the accuracy of the computation by redoing the optimization with x = (x * , û), where x * is the solution obtained in the first optimization.
To solve problem (24-25) we follow the approach by Zhao and Li (2002) based on the pathfollowing algorithm.We begin with the reformulation of (24) as the logarithmic barrier problem where z is introduced to replace the inequality Ax ≥ b by equality and ρ > 0 is a regularizing parameter.
The Lagrangian for this problem reads and the Kuhn-Tucker solvability conditions are diag (x) s = ρe, diag (z) y = ρe, where diag(u) denotes the diagonal matrix with vector u on diagonal; the new variable s = ρ diag(x) −1 e is introduced to obtain the canonical representation of the central path approach; and e = (1, . . ., 1) (of an appropriate dimension).
To find the solution to (26) which fulfills the condition B(x − x) 2 → min for a given vector x and matrix B, we modify the Lagrangian The stationary point of this Lagrangian gives the primal solution for which B(x − x) 2 achieves minimal value and the dual solution which has minimal norm.The Kuhn-Tucker conditions give To simplify the problem we assume that θ = κρ p for given constants κ ∈ (0, 1] and p ∈ (0, 1).We define the nonlinear mapping The R Journal Vol.10/1, July 2018 ISSN 2073-4859 The problem is now to find (x * , y * , s * , z * ) ≥ 0 which solve the equation The solution to this equation is searched by the Newton iterative method where ∇F ρ (x, y, s, z) denotes the gradient of function F ρ given by the expression These iterative solutions form the path-following algorithm in a neighborhood of the regularized central path N β (ρ) = (x, y, s, z) : F ρ (x, y, s, z) ∞ ≤ βρ , with β ∈ (0, 1).The complete algorithm is as follows:
Step 3. Reduction of ρ.Find the maximal improvement step for ρ k by the Armijo rule: find the smallest j such that γ j = b j 2 and Set ρ k+1 = (1 − γ j )ρ k and go to Step 2.
We prove now that starting from the initial iterate defined in Step 1, we obtain a convergent sequence of iterates.The estimate of the norm F creates a difficulty because of the term ρ p k which for The R Journal Vol.10/1, July 2018 ISSN 2073-4859 ρ k < 1 cannot be estimated by the first power of ρ k .That requires such a choice of the starting point which cancels the terms with ρ p .This goal is achieved by a proper choice of constant κ.The following lemma is an adaptation of Lemma 4.1 from Zhao and Li (2002).
Lemma 2.4.1.The central path projection algorithm is well-defined.The sequence ρ k is monotonically decreasing and (x k , y k , s k , z k ) ∈ N β (ρ k ) for all k ≥ 0.
Proof.The proof goes along the lines of the original proof of Zhao and Li (2002).We have only to remark that the nonsingularity of matrix ∇F ρ follows from the fact that the matrix For the correctness of Step 3 of the algorithm, we have to show that point (x k+1 , y k+1 , s k+1 , z k+1 ) belongs to N β (ρ k+1 ).To this end, we have to prove the estimate The rest of the proof is similar as in Zhao and Li (2002).The boundedness of (x k , y k ) follows from Lemma 2.4.2 below.
The proof of the convergence of the iterative sequence requires some modification of the proof by Zhao and Li (2002).The modification is formulated in the following lemma.
Lemma 2.4.2.When the solution set to equation ( 30) is nonempty, then the sequence (x k , y k , s k , z k ) obtained by the central path projection algorithm is bounded.
Proof.We follow the line of the proof of Theorem 4.1 in Zhao and Li (2002).Let (u k , v k , w k , q k ) be defined as Then (u k , v k , w k , q k ) ∞ ≤ β by the definition of N β (ρ k ) and the following system of equations holds Let (x * , y * , s * , z * ) be an optimal solution, i.e. a solution to equation (30).Then (x * , y * , s * , z * ) ≥ 0 and (s * , z * ) is given by the expression Due to (28) (x * ) T s * = 0 and (y * ) T z * = 0.
In what follows, we use the positive semidefiniteness The R Journal Vol.10/1, July 2018 ISSN 2073-4859 Then we have From equation ( 36) we obtain and the estimate for some positive constant c.
Finally we obtain Since matrix B is invertible, we have Taking into account the above estimates and the estimate ρ k ≤ ρ 0 , we obtain The boundedness of (s k , z k ) follows from the above estimate and equation ( 36).
The choice of (x 0 , y 0 , s 0 , z 0 ) starts with y 0 = e and z 0 = κρ p 0 e.We choose x 0 and s 0 so that the term κρ p 0 B T B(x 0 − x) is canceled.Taking x 0 = max(e, e + x) makes x 0 > 0. On the other hand, κρ p 0 B T B(x 0 − x) is a vector with nonzero components bounded from above by κρ p 0 (1 + x− ).Hence we define s 0,init = B T B(x 0 − x) and compute κ = 1/ diag(x 0 )s 0,init ∞ .Then we take s 0 = κρ p 0 s 0,init .That procedure eliminates from F ρ 0 ∞ the terms with ρ p 0 and replaces them by constants.We can now describe the construction of the initial point step-by-step: 1. Set y 0 = e.
The estimates of optimal portfolio weights for different sample sizes are collected in Table 2.The 95% confidence intervals of portfolio weights are reported in brackets.If the difference between portfolio weights is larger that two standard deviations we conclude that the difference is statistically significant (this corresponds to 95% two-sided Student-t test whose critical value for the sample of size 10 is 2.228).The results of Table 2 show that samples of size 10 000 are much to small to produce reliable portfolio weights.The values for that sample size are statistically significantly different from the values for 1 000 000 samples.From the values of confidence intervals of portfolio weights we can conclude that the computation of optimal portfolio weights from samples of size 10 000 can give values with an error up to 50% (for all non-zero weights and the sample of size 10 000 the confidence intervals are of order of one half of the corresponding weights).This is a clear indication that to obtain reliable values of portfolio weights we have to use samples of size 1 000 000 or take the average of a large number of repetitions, which is computationally equivalent.
We analyze now the effect of the number of assets in portfolio on the accuracy of computations.To this end, we use the data-set etfdata from the R-package parma.For comparison we take two subsets: etfdata[1:500,1:5](with 5 assets) and etfdata[1:500,1:10] (with 10 assets).We add to our code a new function computing the vector of means and covariance matrix for a data set with k assets and perform computations with k = 5 and k = 10.

Projection algorithm
In testing the projection algorithm described in Section Projection algorithm for portfolio optimization the following values of parameters are used δ = 0.5, p = 0.8, σ = 10 −3 , b1 = 0.9 and b2 = 0.9.The starting point in all computations is calculated by the procedure described in that Section.The algorithm is tested on simulated data generated from a normal distribution using the vectors of means and covariance matrices estimated from the previously used data sets etfdata[1:500,1:5] and etfdata[1:500,1:10] from the package parma.Samples of 125, 250 and 500 returns are generated assuming they are weekly returns (the vector of means and covariance matrix appropriately scaled) which corresponds to 2.5, 5 and 10 years of weekly data.For each sample we compute the vector of portfolio weights using the function PortfolioOptimProjection with tol = 1e-6 and taking as the benchmark the portfolio 1/J, i.e. the portfolio with all weights equal to 1/J, where J is the number of assets in the portfolio.The computation for each sample size and asset set is repeated 10 times.
Taking into account that the sample of size n corresponds to the matrix A of approximate dimension n × n the results of Table 4 are compatible with the results reported in Table 6.1 of Zhao and Li (2002).The observation which is missed in Zhao and Li (2002) is the standard deviation of the running time.This standard deviation is of the same order of magnitude as the average running time.We can conclude that the computational time depends not only on the size of matrix A but also on the entries, The R Journal Vol. which is particularly surprising since samples are drawn from the same distribution.Of course, the sample of size 125 can be considered as small even for returns of 5 assets.But the sample of size 500 is already quite large.In addition, the averaged running time is increasing with the sample size, however that increase is very irregular.At the same time, the standard deviation behaves also irregularly (similarly as for the averaged running time).All these observations show that the computation time is very sensitive not only to the dimensionality of the problem but also to a particular realization of the data.However we can still conclude that on average the algorithm is efficient for moderate-size portfolio optimization problems.Indeed, Zhao and Li (2002) already observed that the convergence rate of their algorithm was slow.The same is valid for our extension of their algorithm, which makes the algorithm not practically applicable for portfolio optimization problems when the size of the discrete distribution of returns is larger than 10 3 .There is no clear indication which part of the algorithm has the main effect on slowing down the convergence.

Summary
We presented two optimization algorithms for financial portfolios.These algorithms find optimal portfolios in problems when the nonlinear optimization in mean-risk setting can be reformulated as a linear programming problem.The algorithm implementing Benders cuts allows for large samples.
Our experience with very large data samples obtained by simulation from a given distribution shows that the algorithm is robust and estimated optimal portfolios are very stable (the standard deviation of portfolio weights is below 0.5 %).The second algorithm based on the central path projection can be useful for small data samples where solutions are not unique and form a multidimensional simplex.
Extracting from this simplex a point with the smallest distance to a given benchmark portfolio can be used to improve the decision process in asset management.However, the second algorithm which implements the central path projection requires further analysis.In particular, we would like to make the algorithm's performance and running time less dependent on the realization of the data.

Figure 1 :
Figure 1: Benders decomposition algorithm for a simple recourse problem

Figure 2 :
Figure 2: Regularized central path algorithm with projection

Table 1 :
Table 1 contains the average running times for 10 different realizations together with the standard deviations.Averaged running time (in sec.) and its standard deviation for computations with different sample sizes.

Table 2 :
Optimal portfolios for different sample sizes.95% confidence intervals of portfolio weights given in parentheses (all values are in percentage points).

Table 3 :
The average width of 95% confidence interval of portfolio weights (values are in percentage points).

Table 4 :
Average running time and its standard deviation for the computation of optimal portfolios by the central path projection algorithm for different number of assets and different sample sizes (all values in sec. of CPU time).