On some extensions to GA package: hybrid optimisation, parallelisation and islands evolution

Genetic algorithms are stochastic iterative algorithms in which a population of individuals evolve by emulating the process of biological evolution and natural selection. The R package GA provides a collection of general purpose functions for optimisation using genetic algorithms. This paper describes some enhancements recently introduced in version 3 of the package. In particular, hybrid GAs have been implemented by including the option to perform local searches during the evolution. This allows to combine the power of genetic algorithms with the speed of a local optimiser. Another major improvement is the provision of facilities for parallel computing. Parallelisation has been implemented using both the master-slave approach and the islands evolution model. Several examples of usage are presented, with both real-world data examples and benchmark functions, showing that often high-quality solutions can be obtained more efficiently.


Introduction
Optimisation problems of both practical and theoretical importance deal with the search of an optimal configuration for a set of variables to achieve some specified goals. Potential solutions may be encoded with real-valued, discrete, binary or permutation decision variables depending on the problem to be solved. Optimisation methods for real-valued functions can be roughly classified into two groups: direct and gradient-based methods (Chong and Zak, 2013;Givens and Hoeting, 2013, Chap. 2). In direct search methods only the objective function is used to guide the search strategy, whereas gradient-based methods consider the first and/or secondorder derivatives of the objective function during the search process. Constraints may be present and are usually taken into account in the definition of the objective function or in the decision variables representation. Direct search methods can be applied without modifications to many optimisation tasks, but they are usually slow requiring many function evaluations for convergence. On the contrary, gradient-based methods quickly converge to an optimal solution, but are not efficient in non-differentiable or discontinuous problems. Both direct and gradient-based techniques depend on the chosen initial starting values, so they can get stuck in suboptimal solutions. Furthermore, they are not efficient in handling problems with discrete decision variables, and cannot be efficiently implemented on parallel machines. Problems where the decision variables are expressed using discrete or binary values are usually referred to as combinatorial optimisation problems, and consist in searching for the best solution from a set of discrete items (Papadimitriou and Steiglitz, 1998;Givens and Hoeting, 2013, Chap. 3). Typical examples are the knapsack problem, the minimum spanning tree, the traveling salesman problem, and the vehicle routing problem. Although in principle these type of problems can be solved with exact algorithms, the time required to solve them increases exponentially as the size of the problem grows.
A large number of heuristics and metaheuristics algorithms have been proposed for solving complex optimisation tasks . Specific (ad-hoc) heuristic techniques are able to identify solutions in a reasonably short amount of time, but the solutions obtained are generally not guaranteed to be optimal or accurate. On the contrary, metaheuristics offer a tradeoff between exact and heuristics methods, in the sense that they are generic techniques that offer good solutions, often the global optimal value sought, in a moderate execution time by efficiently and effectively exploring the search space (Luke, 2013). This class of algorithms typically implements some form of stochastic optimisation and includes: Evolutionary Algorithm (EA; Back et al., 2000a,b), Iterated Local Search (ILS; Lourenço et al., 2003), Simulated Annealing (SA; Kirkpatrick et al., 1983), Tabu Search (TS; Glover and Laguna, 2013), and Ant Colony Optimisation (ACO; Dorigo and Stützle, 2004).
EAs are stochastic iterative algorithms in which a population of individuals evolve by emulating the biological processes observed in natural evolution and genetics (Eiben and Smith, 2003;De Jong, 2006;Simon, 2013). Each individual of the population represents a tentative solution to the problem. The quality of the proposed solution is expressed by the value of a fitness function assigned to each individual. This value is then used by EAs to guide the search and improve the fitness of the population. Compared to other metaheuristics algorithms, EAs are able to balance between exploration of new areas of the search space and exploitation of good solutions. The trade-off between exploration and exploitation is controlled by some tuning parameters, such as the population size, the genetics operators (i.e. selection, crossover, and mutation), and the probability of applying them. Genetic Algorithms (GAs) are search and optimisation procedures that are motivated by the principles of natural genetics and natural selection. GAs are the "earliest, most well-known, and most widely-used EAs" (Simon, 2013, p. 35).
R offers several tools for solving optimisation problems. A comprehensive listing of available packages is contained in the CRAN task view on "Optimization and Mathematical Programming" (Theussl and Borchers, 2015). An extensive treatment of optimisation techniques applied to problems that arise in statistics and how to solve them using R is provided by Nash (2014). A gentle introduction to metaheuristics optimisation methods in R is contained in Cortez (2014). The R package GA is a flexible general-purpose set of tools for optimisation using genetic algorithms and it is fully described in Scrucca (2013). Real-valued, integer, binary and permutation GAs are implemented, whether constrained or not. Users can easily define their own objective function depending on the problem at hand. Several genetic operators for selection, crossover, and mutation are available, and more can be defined by experienced R users. This paper describes some recent additions to the GA package. The first improvement involves the option to use hybrid GAs. Although GAs are able to identify the region of the search space where the global optimum is located, they are not especially fast at finding the optimum when in a locally quadratic region. Hybrid GAs combine the power of GAs with the speed of a local optimiser, allowing researchers to find a global solution more efficiently than with the conventional evolutionary algorithms. Because GAs can be easily and conveniently executed in parallel machines, the second area of improvement is that associated with parallel computing. Two approaches, the master-slave and islands models, have been implemented and are fully described. Several examples, using both real-world data examples and benchmark functions, are presented and discussed.

GA package
In the following we assume that the reader has already installed the latest version (≥ 3.0) of the package from CRAN with > install.packages ("GA") and the package is loaded into an R session using the usual command > library(GA)

Hybrid genetic algorithms
EAs are very good at identifying near-optimal regions of the search space (exploration), but they can take a relatively long time to locate the exact local optimum in the region of interest (exploitation). More effective algorithms might try to incorporate efficient local search algorithms into EAs. There are different ways in which local searches or problem-specific information can be integrated in EAs (see Eiben and Smith, 2003, Chap. 10). For instance, a local search may be started from the best solution found by a GA after a certain number of iterations, so that, once a promising region is identified, the convergence to the global optimum can be speed up.
These evolutionary methods have been named in various ways, such as hybrid GAs, memetic GAs, and genetic local search algorithms. Some have argued that the inclusion of a local search in GAs implies the use of a form of Lamarckian evolution. This fact has been criticised from a biological point of view, but "despite the theoretical objections, hybrid genetic algorithms typically do well at optimization tasks" (Whitley, 1994, p. 82).
In case of real-valued optimisation problems, the GA package provides a simple to use implementation of hybrid GAs by setting the argument optim = TRUE in a ga() function call. This allows to perform local searches using the base R function optim(), which makes available generalpurpose optimisation methods, such as NelderâĂŞMead, quasi-Newton with and without box constraints, and conjugate-gradient algorithms.
Having set optim = TRUE, the local search method to be used and other parameters can be controlled with the optional argument optimArgs. This must be a list with the following structure and defaults: optimArgs = list(method = "L-BFGS-B", poptim = . 5, pressel = .5, control = list(fnscale = -1, maxit = 1 )) where method The method to be used among those available in optim function (see Details section in help(optim)). By default, the BFGS with box constraints is used, where the bounds are those provided in the ga() function call). poptim A value in the range (0, 1) which gives the the probability of applying the local search at each iteration. pressel A value in the range (0, 1) which specifies the pressure selection. control A list of parameters for fine tuning the optim algorithm (see help(optim) for details).
In the implementation available in GA, the local search is applied stochastically during the GA iterations with probability poptim ∈ [0, 1]; by default, once every 1/0.05 = 20 iterations on average. The local search algorithm is started from a random selected solution drawn with probability proportional to fitness and with the selection process controlled by the parameter pressel ∈ [0, 1]. The latter value is used in the function optimProbsel() for computing the probability of selection for each individual of the genetic population. Smaller values of pressel tend to assign equal probabilities to all the solutions, and larger values tend to assign larger values to those solutions having better fitness. As an example, consider the following output which presents a vector of fitness values f assgined to different solutions, and the corresponding probabilities of selection obtained by varying the selection pressure parameter: > f <-c(1, 2, 5, 1 , 1 ) > data.frame(f = f, " " = optimProbsel(f, ), " .2" = optimProbsel(f, .2), " .5" = optimProbsel(f, .5), " .9" = optimProbsel(f, .9), "1" = optimProbsel ( When no pressure selection is set, i.e. at 0, the same probability is assigned to all. Larger probabilities are assigned to larger f values as the pressure value increases. In the extreme case of pressure selection equal to 1, only the largest f has assigned a probability of selection equal to 1, whereas the others have no chance of being selected. When a ga() function call is issued with optim = TRUE, a local search is always applied at the end of GA evolution (even in case of poptim = ), but now starting from the solution with the highest fitness value. The rationale for this is to allow for local optimisation as a final improvement step.

Portfolio selection
In portfolio selection the goal is to find the optimal portfolio, i.e. the portfolio that provides the highest return and lowest risk. This is achieved by choosing the optimal set of proportions of various financial assets (Ruppert and Matteson, 2015, Chap. 16). In this section an example of meanâĂŞvariance efficient portfolio selection (Gilli et al., 2011, Chap. 13) is illustrated.
Suppose we have selected 10 stocks from which to build a portfolio. We want to determine how much of each stock to include in our portfolio. The expected return rate of our portfolio is where E(R i ) is the expected return rate on asset i, and w i is the fraction of the portfolio value due to asset i. Note that the portfolio weights w i must satisfy the constraints w i ≥ 0, and 10 i=1 w i = 1. At the same time, we want to minimise the variance of portfolio returns given by where Σ is the covariance matrix of stocks returns, and w = (w 1 , . . . , w 10 ), under the constraint that the portfolio must have a minimum expected return of 1%, i.e E(R) ≥ 0.01. Consider the following stocks with monthly return rates obtained by Yahoo finance using the quantmod package: > library(quantmod) > myStocks <-c("AAPL", "XOM", "GOOGL", "MSFT", "GE", "JNJ", "WMT", "CVX", "PG", "WFC") > getSymbols(myStocks, src = "yahoo") > returns <-lapply(myStocks, function(s) monthlyReturn(eval(parse(text = s)), subset = "2 13::2 14")) > returns <-do.call(cbind,returns) > colnames(returns) <-myStocks The monthly return rates for the portfolio stocks are shown in Figure 1 and obtained with the code: > library(timeSeries) > plot(as.timeSeries(returns), at = "chic", minor.ticks="month", mar.multi = c( .2, 5.1, .2, 1.1), oma.multi = c(4, , 4, ), col = .colorwheelPalette(1 ), cex.lab = .8, cex.axis = .8) > title("Portfolio Returns") Summary statistics for the portfolio stocks are computed as: > nStocks <-ncol(returns) # number of portfolio assets > R <-colMeans(returns) # average monthly returns > S <-cov(returns) # covariance matrix of monthly returns > s <-sqrt(diag(S)) # volatility of monthly returns > plot(s, R, type = "n", panel.first = grid(), xlab = "Std. dev. monthly returns", ylab = "Average monthly returns") > text(s, R, names(R), col = .colorwheelPalette(1 ), font = 2) The last two commands draw a graph of the average vs standard deviation for the monthly returns (see Figure 2a). From this graph we can see that there exists a high degree of heterogenity among stocks, with AAPL having the largest standard deviation and negative average return, whereas some stocks have small volatility and high returns, such as WFC and MSFT. Clearly, the latter are good candidate for inclusion in the portfolio. The exact amount of each stock also depends on the correlation among stocks through the variance of portfolio returns σ 2 p , and so we need to formalise our objective function under the given constraints. In order to compute the GA fitness function, we define the following functions: We may define the fitness function to be maximised as the (negative) variance of the portfolio penalised by an amount which is function of the distance between the expected return of the portfolio and the target value: > fitness <-function(w) # fitness function { ER <-ExpReturn(w)-. 1 penalty <-if(ER < ) 1 *ER^2 else -(VarPortfolio(w) + penalty) } A hybrid GA with local search can be obtained with the following call: > GA <-ga(type = "real-valued", fitness = fitness, min = rep( , nStocks), max = rep(1, nStocks), names = myStocks, maxiter = 1 , run = 2 , optim = TRUE) > summary(GA) The last command produces the graph on Figure 2c, which shows the trace of best, mean, and median values during the HGA iterations. The vertical dashes at the top of the graph indicate where the local search occurred. It is interesting to note that the inclusion of a local search greatly speedup the termination of the GA search, which converges after 216 iterations. Without including the local optimisation step, a fitness function value within a 1% from the maximum value found above is attained after 1, 633 iterations, whereas the same maximum fitness value cannot be achieved even after 100, 000 iterations.
The estimated portfolio weights and the corresponding expected return and variance are computed as: The last command draws a barchart of the optimal portfolio selected, and it is shown in Figure 2b.
The two estimated models can be compared using a model selection criterion, such as the Bayesian information criterion (BIC; Schwartz, 1978) defined as where ( θ; y) is the log-likelihood evaluated at the MLE θ, n is the number of observations, and ν is the number of estimated parameters. Using this definition, larger values of BIC are preferable.

S-I-R model for influenza epidemic
The S-I-R model is a simple epidemiology compartmental model proposed by Kermack and McKendrick (1927), which assumes a fixed population with only three compartments or states: • S(t) = number of susceptible, i.e. the number of individuals susceptible to the disease not yet infected at time t; • I(t) = number of infected, i.e. the number of individuals who have been infected at time t with the disease and are capable of spreading the disease to those in the susceptible category; • R(t) = number of recovered, i.e. those individuals who have been infected and then removed from the disease, either due to immunisation or due to death. Members of this compartment are not able to be infected again or to transmit the infection to others.
Using a fixed population, i.e. with constant size N = S(t) + I(t) + R(t), Kermack and McKendrick (1927) derived the following system of quadratic ODEs: where β > 0 is the rate (constant for all individuals) at which an infected person infects a susceptible person, and γ > 0 is the rate at which infected people recover from the disease. The flow of the S-I-R model can be represented in the following scheme: where boxes represent the compartments and arrows indicate flows between compartments. Note that dS dt + dI dt + dR dt = 0, then S(t) + I(t) + R(t) = N , and the initial condition S(0) > 0, I(0) > 0, R(0) = 0. Thus, the system can be reduced to a system of two ODEs.
For our data analysis example, we consider the influenza epidemic in an English boarding school from 22nd January to 4th February 1978 as described in Murray (2002, p. 325-326). There were 763 resident boys in the school, and one (the initial infective) returned from winter break with illness. Over the course of 13 days, 512 boys were infected by the flu.
> day <-:14 > Infected <-c(1,3,6,25,73,222,294,258,237,191,125,69,27,11,4) We aim at estimating the values of β and γ based on the observed data by minimising the following loss function: where I(t) is the number of infected observed at time t, and I(t) is the corresponding number of infected predicted by the model, which depends on the unknown parameters β and γ. Nonlinear least squares can be used to fit this model to data, but it strongly depends on the initial values as shown below. A more robust approach can be pursued by using GAs. First of all, we need to define a function which computes the values of the derivatives in the ODE system at time t. This function is then used, together with the initial values of the system and the time sequence, by function ode() in the R package deSolve to solve the ODE system: > library(deSolve) > SIR <-function(time, state, parameters) { par <-as.list(c(state, parameters)) with(par, { dS <--beta * S * I dI <-beta * S * I -gamma * I dR <-gamma * I list(c(dS, dI, dR)) }) } > RSS.SIR <-function(parameters) { names(parameters) <-c("beta", "gamma") out <-ode(y = init, times = day, func = SIR, parms = parameters) fit <-out[,3] RSS <-sum((Infected -fit)^2) return(RSS) } The function RSS.SIR() computes the predicted number of infected I(t) from the solution of ODE system for the input parameters values, and returns the objective function in (3) to be minimised. Then, a ga() function call can be used with local search to find the optimal values of parameters (β, γ) in S-I-R model. Note that the fitness function is specified as a local function which simply returns the negative of the objective function. In this case, fine tuning of local search is specified through the optional argument optimArgs: the selection pressure is set with pressel at a higher value, so better solutions have higher probability of being used as starting point for the local search, and maxit gets a two-values vector specifying the maximum number of iterations to be used, respectively, during the GA evolution and after the final iteration. The graph in Figure 5b provides a graphical summary of quantities involved in S-I-R model and the dynamic evolution of epidemia: > t <-seq( , 15, length = 1 ) > fit <-data.frame(ode(y = init, times = t, func = SIR, parms = GA@solution[1,])) > col <-brewer.pal(4, "GnBu")[-1] > matplot(fit$time, fit[,2:4], type = "l", xlab = "Day", ylab = "Number of subjects", lwd = 2, lty = 1, col = col) > points(day, Infected) > legend("right", c("Susceptibles", "Infecteds", "Recovereds"), lty = 1, lwd = 2, col = col, inset = . 5) We note that Murray (2002) reported solution (β = 0.00218, γ = 0.441) gives a RSS equal to 4535.9, larger than the optimal solution found by HGAs which is equal to 4507.1. Furthermore, direct optimisation depends on starting values and often converges to sub-optimal solutions as, for instance, the following:

Parallel genetic algorithms
Parallel computing in its essence involves the simultaneous use of multiple computing resources to solve a computational problem. This is viable when a task can be divided into several parts that can be solved simultaneously and independently, either on a single multi-core processors machine or on a cluster of multiple computers. Support for parallel computing in R is available since 2011 (version 2.14.0) through the base package parallel. This provides parallel facilities previously contained in packages multicore and snow. Several approaches to parallel computing are available in R (McCallum and Weston, 2011), and an extensive and updated list of R packages is reported in the CRAN task view on High-Performance and Parallel Computing with R (Eddelbuettel, 2016).
GAs are regarded as "embarrassingly parallel" problems, meaning that they require a large number of independent calculations with negligible synchronisation and communication costs. Thus, GAs are particularly suitable for parallel computing, and it is not surprising that such idea has been often exploited to speed up computations (see for instance Whitley (1994) in the statistical literature).
Luque and Alba (2011) identify several types of parallel GAs. In the master-slaves approach there is a single population, as in sequential GAs, but the evaluation of fitness is distributed among several processors (slaves). The master process is responsible of the distribution of the fitness function evaluation tasks performed by the slaves, and for applying genetic operators such as selection, crossover, and mutation (see Figure 6). Since the latter operations involve the entire population, it is also known as global parallel GAs (GPGA). This approach is generally efficient when the computational time involving the evaluation of the fitness function is more expensive than the communication overhead between processors.
Another approach is the case of distributed multiple-population GAs, where the population is partitioned into several subpopulations and assigned to separated islands. Independent GAs are executed in each island, and only occasionally sparse exchanges of individuals are performed among these islands (see Figure 7). This process, called migration, introduces some diversity into the subpopulations, thus preventing the search from getting stuck in local optima. In principle islands can evolve sequentially, but increased computational efficiency is obtained by running GAs in each island in parallel. This approach is known as coarse-grained GAs or island parallel GAs (ISLPGA).
By default, searches performed with the GA package occur sequentially. In some cases, particularly when the evaluation of the fitness function is time consuming, parallelisation of the search algorithm may be able to speed up computing time. Starting with version 2.0, the GA package provides facilities for using parallel computing in genetic algorithms following the GPGA approach. Recently, with version 3.0, the ISLPGA model has also been implemented in the GA package. The following subsections describes usage of both approaches.
Parallel computing in the GA package requires the following packages to be installed: parallel (available in base R), doParallel, foreach, and iterators. Moreover, doRNG is needed for reproducibility in the ISLPGA case.  . In a multiple-population parallel GA each process is a simple GA which evolves independently. Individuals occasionally migrate between one island and its neighbours.

Global parallel implementation
The GPGA approach to parallel computing in GA can be easily obtained by manipulating the optional argument parallel in the ga() function call. This argument accepts several different values. A logical value may be used to specify if parallel computing should be used (TRUE) or not (FALSE, default) for evaluating the fitness function. A numeric value can also be supplied, in which case it gives the number of cores/processors to employ; by default, all the available cores, as provided by detectCores(), are used. Two types of parallel functionalities are available depending on system OS: on Windows only snow type functionality is present, whereas on POSIX operating systems, such as Unix, GNU/Linux, and Mac OSX, both snow and multicore (default) functionalities are available. In the latter case, a string can be used as the argument to parallel to set out which parallelisation tool should be used.
A final option is available if a researcher plans to use a cluster of multiple machines. In this case, ga() can be executed in parallel using all, or a subset of, the cores available to each machine assigned to the cluster. However, this option requires more work from the user, who needs to set up and register a parallel back end. The resulting cluster object can be passed as input value to the parallel argument.

Islands parallel implementation
The ISLPGA approach to parallel computing in GA has been implemented in the gaisl() function. This function accepts the same input arguments as the ga() function (see Scrucca, 2013, Section 3), with the following additional arguments: numIslands An integer value which specifies the number of islands to use in the genetic evolution (by default is set to 4). migrationRate A value in the range (0, 1) which gives the proportion of individuals that undergo migration between islands in every exchange (by default equal to 0.10). migrationInterval An integer value specifying the number of iterations at which exchange of individuals takes place. This interval between migrations is called an epoch, and it is set at 10 by default.
The implemented ISLPGA uses a simple ring topology, in which each island is connected unidirectionally with another island, hence forming a single continuous pathway (see Figure 7). Thus, at each exchange step the top individuals, selected according to the specified migrationRate, substitute random individuals (with the exception of the elitist ones) in the connected island.
By default, the function gaisl() uses parallel = TRUE, i.e. the islands algorithm is run in parallel, but other values can also be provided as described in the previous subsection. Note that it is possible to specify a number of islands larger than the number of available cores. In such a case, the parallel algorithm will be run using blocks of islands, with the block size depending on the maximal number of cores available or the number of processors as specified by the user.
It has been noted that using parallel islands GAs often leads to, not only faster algorithms, but also superior numerical performance even when the algorithms run on a single processor. This because each island can search in very different regions of the whole search space, thus enhancing the exploratory attitude of evolutionary algorithms.

Simulation study
In this Section results from a simulation study are presented and discussed. The main goal is to compare the performance of sequential GAs with the two forms of parallel algorithms implemented in the GA package, namely GPGA and ISLPGA, for varying number of cores and different fitness computing times. A fictitious fitness function is used to allow for controlling the computing time required at each evaluation. This is achieved by including the argument pause which suspend the execution for a specified time interval (in seconds): The simulation design parameters used are the following: > ncores <-c (1,2,4,8,16) # number of cores/processors > pause <-c( . 1, .1, 1, 2) # pause during fitness evaluation > nrep <-1 # number of simulation replications Thus, ncores specifies that up to 16 cores or CPU processors are used in the parallel GAs solutions for increasing time spent on fitness evaluation as specified by pause (in seconds). Each combination of design parameters is replicated nrep = 1 times and results are then averaged. GAs are run under the GPGA approach using popSize = 5 and maxiter = 100. For ISLPGA runs the numIslands argument is set at the specified number of cores, with popSize = 16 and maxiter = 1 . The increased population size allows to work with at least 10 individuals on each island when numIslands is set at the maximum number of cores. In both cases, the remaining arguments in ga() or gaisl() function are set at their defaults.
The study was performed on a 16 cores Intel ® Xeon ® CPU E5-2630 running at 2.40GHz and with 128GB of RAM. The R code used in the simulation study is provided in the accompanying supplemental material.
Graphs in the left panel of Figures 8 and 9 show the average execution times needed for varying number of cores and different fitness computing times. As expected, increasing the number of cores allows to run GAs faster, but the improvement is not linear, in particular for the GPGA approach.
By using a machine with P cores/processors, we would like to obtain an increase in calculation speed of P times. However, this is typically not the case because in the implementation of a parallel algorithm there are some inherent non-parallelisable parts and communication costs between tasks (Nakano, 2012). The speedup achieved using P processors is computed as s P = graph on the right panel shows the speedup factor compared to the sequential run (i.e. when only 1 core is used). In the latter plot, the dashed line represents the "ideal" linear speedup.
t 1 /t P , where t i is the execution time spent using i cores. Graphs in the right panel of Figures 8  and 9 show the speedup obtained in our simulation study. For the GPGA approach the speedup is quite good but it is always sub-linear, in particular for the less demanding fitness evaluation time and when the number of cores increases. On the other hand, the ISLPGA implementation shows a very good speedup (nearly linear). Amdahl's law (Amdahl, 1967) is often used in parallel computing to predict the theoretical maximum speedup when using multiple processors. According to this, if f is the fraction of non-parallelisable task, i.e. the part of the algorithm that is strictly serial, and P is the number of processors in use, then the speedup obtained on a parallel computing platform follows the equation In the limit, the above ratio converges to S max = 1/f , which represents the maximum speedup attainable in theory, i.e. by a machine with an infinite number of processors. Figures 10 and  11 show the observed speedup factors S P and the estimated Amdahl's law curves fitted by nonlinear least squares. In all the cases, Amdahl's law appears to well approximate the observed behaviour. The horizontal dashed lines are drawn at the maximum speedup S max , which is computed based on the estimated fraction of non-parallelisable task f (see also Table 2). As the time required for evaluating the fitness function increases, the maximum speedup attainable also increases. As noted earlier, the ISLPGA approach shows an improved efficiency compared to the simple GPGA. graph on the right panel shows the speedup factor compared to the sequential run (i.e. when only 1 core is used). In the latter plot, the dashed line represents the "ideal" linear speedup.

ARIMA order selection
Autoregressive moving average (ARMA) models are a broad class of parametric models for stationary time series popularised by Box and Jenkins (1976). They provide a parsimonious description of a stationary stochastic process in terms of two polynomials, one for the autoregression and the second for the moving average. Nonstationay time series can be modelled by including an initial differencing step ("integrated" part of the model). This leads to autoregressive integrated moving average (ARIMA) models, a popular modelling approach in real-world processes. ARIMA models can be fitted by MLE after identifying the order (p, d, q) for the autoregressive, integrated, and moving average components, respectively. This is typically achieved by preliminary inspection of the autocovariance function (ACF) and partial autocovariance function (PACF). Model selection criteria, such as the Akaike information criterion (AIC), the corrected AIC (AICc), and the Bayesian information criterion (BIC), are also used for order selection.
The function auto.arima() in package forecast provides an automatic algorithm which combines unit root tests, minimisation of the AICc in a stepwise greedy search, and MLE, to select the order of an ARIMA model. Here, an island parallel GAs approach is used for order selection. Consider the quarterly U.S. GNP from 1947(1) to 2002(3) expressed in billions of chained 1996 dollars and seasonally adjusted. The data are available on package astsa and described in Shumway and Stoffer (2013).
> data(gnp, package="astsa") > plot (gnp) The plot of the time series obtained with the last command is shown in Figure 12a.
The fitness function to be used in the GA search is defined as follows: > fitness <-function(string, data, bitOrders) { orders <-decode(string, bitOrders) mod <-try(Arima(data, order = orders, include.constant = TRUE, method = "ML"), silent = TRUE) if(inherits(mod, "try-error")) NA else -mod$bic } Note that the objective function is defined as (minus) the BIC for the specified ARIMA model, with the latter fitted using the Arima() function available in the R package forecast.
An island binary parallel GA is then used to search for the best ARIMA model, using a migration interval of 20 generations, and the default migration rate of 0.1: > GA <-gaisl(type = "binary", nBits = 8, fitness = fitness, data = gnp, bitOrders = c(3,2,3), maxiter = 1 , run = 1 , popSize = 5 , numIslands = 4, migrationInterval = 2 ) > plot(GA) > summary(GA)  Figure 12b shows the trace of the ISLPGA search for each of the four islands used. All the islands converge to the same final solution, as also shown by the summary output above. The selected model is an ARIMA(2,2,1), which can be fitted using: It is interesting to compare the above solution with that obtained with the automatic procedure implemented in auto.arima() using the same criterion: > mod1 <-auto.arima(gnp, ic = "bic") > print ( The model returned by auto.arima() is an ARIMA (1,2,1), so a simpler model where an AR(1) component is chosen instead of an AR(2). The BIC values are almost equivalent, with a slightly smaller value for the ARIMA(2,2,1) model identified by ISLPGA. However, by looking at some diagnostic plots it seems that a second-order AR component is really needed to account for autocorrelation at several lags as indicated by the Ljung-Box test of autocorrelation (see Figure 13; the code used to produce the plots is available in the supplementary material).

Empirical Bayes beta-binomial model for rates estimation
Consider the problem of estimating the probability p i of an event based on the observed number of successes x i out of n i trials, for i = 1, . . . , N independent observations. A reasonable model assumes a binomial distribution for the number of successes, i.e.
x i |p i ∼ Bin(p i , n i ), with known trials n i > 0 and unknown parameters p i . Suppose that the p i s are generated from a common distribution, which we may take to be the Beta distribution, i.e.
This a conjugate prior for the binomial likelihood, so the posterior distribution turns out to be The unknown rate p i can then be estimated by the posterior mean E(p i |x i ) = α + x i α + β + n i , or by the maximum a posteriori estimate, In the Empirical Bayes approach the parameters α and β of the prior distribution are estimated using the observed data. This is done by maximising the marginal likelihood of x obtained by integrating the distribution of x i |p i with respect to the parameter p i . Thus, omitting the subscript i, we may write f (x|α, β, n) = 1 0 Bin(x|p, n)Be(p|α, β)dp where B(α, β) = Γ(α) Γ(β)/Γ(α + β) is the beta function, with Γ(t) = ∞ 0 x t−1 e −x dx. This is the density of a Beta-Binomial distribution, for which we can write Under the independence assumption, the marginal log-likelihood is then In the Empirical Bayes approach the general idea is to estimate the parameters of the prior distribution from the data, rather than fixing them based on prior knowledge. Thus, the MMLE of parameters (α, β) are obtained by maximising the marginal log-likelihood in (5), which are then used to obtain the posterior distribution.
One of the problematic case is the Griewank function, which is defined as f (x 1 , . . . , x d ) = 1 + 1 4000 This a multimodal, non-separable function, with several local optima within the search region. For any dimensionality d, it has one global minimum of zero located at the point (0, . . . , 0). Figure 17 shows some perspective plots for d = 2 at different zooming levels.
We replicated the simulation study in Mullen (2014) using the standard sequential GA (GA), the parallel island GA with 4 islands (GAISL), the hybrid GA with local search (HGA), and the island GA with local search (HGAISL). Results for the Griewank function based on 100 replications  are shown in Figure 18. The use of hybrid GAs, particularly in combination with the islands evolution, clearly yields more accurate solutions and with less dispersion. The same behavior has been observed in many other benchmark functions available in the globalOptTests package. Mullen (2014, Section 5) also provided a measure of accuracy for each optimisation method considered by counting the number of successful runs, with the latter defined as a solution less than 0.005 from the minimum of the objective function. The empirical accuracy scores obtained in our simulations are shown in Table 3, and these can be compared with those reported in Mullen's paper and its supplemental material. Hybrid GAs including local optimisation search (HGA) yield a large improvement on accuracy (ranking 2nd with a score of 3717), and when combined with island evolution (HGAISL) achieve the highest overall score (ranking 1st with a score equal to 3954).

Summary
GA is a flexible R package for solving optimisation problems with genetic algorithms. This paper discusses some improvements recently added to the package. We have discussed the implementation of hybrid GAs, which employ local searches during the evolution of a GA to improve accuracy and efficiency. Further speedup can also be achieved by parallel computing. This has been implemented following two different approaches. In the first one, the so-called master-slave approach, the fitness function is evaluated in parallel, either on a single multi-cores machine or on a cluster of multiple computers. In the second approach, called islands model, the evolution takes place independently on several sub-populations assigned to different islands, with occasional migration of solutions between islands. Both enhancements often lead to high-quality solutions more efficiently. Future plans include the possibility to improve overall performance by rewriting some key functions in C++ using the Rcpp package. In particular, coding of genetic operators in C++ should provide sensible benefits in terms of computational speedup. Finally, the package memoise enables to store the results of an expensive fitness function call and returns the cached result when the same input arguments occur again. This strategy could be conveniently employed in the case of binary GAs.